Sunday, March 29, 2020

Neural Networks on the Arduino Part 5: How to update the weights

Previously I said that since we have to sets of weights to update (= two matrices to change) we may as well write a single function and call it twice.
 
So what are the inputs and outputs of our back propagation weight adjusting C like function? I'd suggest...
  1. iNumLocalInputNodes
  2. InputValues (a vector  iNumLocalInputNodes big)
  3. iNumLocalOutputNodes
  4. OutputValues (a vector iNumLocalOutputNodes big)
  5. ErrorValues (a vector iNumLocalOutputNodes big) 
  6. WeightMatrix (with iNumLocalInputNodes rows and iNumLocalOutputNodes columns)
I like to draw (pencil and paper away from the computer!) diagrams of functions I'm going to write, just to make sure I have them correct in my head. If I don't have them clear in my head I will have a snowflake's chance in hell of writing the function correctly.


Note that the inputs and outputs are not necessarily the inputs and outputs of the whole network. The above function will be called first for the hidden to output part and then again for the input to hidden part. That is the point of writing the function, so it can be used in more than one place.

Here is the code, note that I use the WeightMatrix as both an input and output.

// Change the weights matrix so that it produces less errors
void UpdateWeights (int iNumInputNodes,  mtx_type* InputValues, // a col vector
                    int iNumOutputNodes, mtx_type* OutputValues, // a row vector
                    mtx_type* ErrorValues, // same size as output values
                    mtx_type* WeightMatrix) // This is an input and output
{
    // This is just to keep sizes of matrices in mind
    int iNumRows = iNumInputNodes ;
    int iNumCols = iNumOutputNodes ;
   
    // The f "horizontal" row vector is formed from
    // alfa*(error*(output*(1-output)) in each column
    // Initialised from errors and outputs of this layer, and so has the same
    // size as the error vector and output vector
    mtx_type f[iNumOutputNodes] ;

    for (int col = 0 ; col < iNumOutputNodes ; col++) {
        // The outouts have been created using the sigmoid function.
        // The derivative of the sigmnoid is used to modify weights.
        // Fortunately, because we have the outputs values, the derivative
        // is easy to calculate...Look up derivative of sigmoid
        const double SigmoidDeriv = OutputValues[col]*(1.0-OutputValues[col]) ;
        f[col] = Alpha*ErrorValues[col]*SigmoidDeriv ;
    }

    // The "vertical" column vector is the inputs to the current layer

    // Now we can do the outer product to form a matrix from a
    // a column vector multiplied by a row vector...
    // to get a matrix of delta weights
    mtx_type ErrorDeltasMat [iNumRows*iNumCols] ;
    OuterProduct((mtx_type*)InputValues,
                  f,
                  iNumRows, 
                  iNumCols,
                  (mtx_type*)ErrorDeltasMat) ;

    // Now we have the deltas to add to the current matrix
    // We are simply doing OldWeight = OldWeight+DeltaWeight here
    for (int row = 0 ; row < iNumRows ; row++) {
        for (int col = 0 ; col < iNumCols ; col++) {
            int iIndex = (row*iNumCols)+col ;
            WeightMatrix[iIndex] = WeightMatrix[iIndex] +
                                   ErrorDeltasMat[iIndex] ;
        }
    }

}
In the above code Alpha is the learning rate, often between 0.1 and 0.2, and a constant defined elsewhere in the Arduino program.

The SigmoidDeriv is the derivative of the sigmoid function, and is calculated like this:

S' = S(1-S)

Now the outputs of our layers are the sigmoids of the inputs. So the derivative is simply (theoutputs*(1-theoutputs)) as shown in the code above. Simples!










No comments:

Post a Comment