Saturday, March 28, 2020

Neural Networks on the Arduino Part 2: A start to changing the weights



In part one (read it before you read this) we saw how to create a neural network function which could run on an Arduino, but we missed out the most important part: how to train the network. Training here means changing the two matrices which connect the three vectors.

We train the network by back propagation of errors, so the network can reduce the errors and learn to get closer to the correct answer. It is called that because
  1. The errors goes backwards from the output towards the input 
  2. The errors are propagated into the network
We'll be using the MatrixMath.h functions for the Arduino, here is a simple neural network and the matrices and vectors which represent it (remember weights are stored inside matrices):



Ok, so how to we change the weights to get the outputs closer to the targets? The best explanation I have found so far is in the book by Tariq Rashid...
... so I'll follow that, and translating from Python to Arduino C/C++. Here is the function from the book:


Here is the same thing in my blocky matrix type illustration




Stay with me. I had to draw these diagrams several times before I fully understood what I was doing.

The column vector  Oj (inputs to this layer (outputs from the previous layer)) multiplied by f (a row vector) create the matrix which are the changes to add (the deltas to apply) to the 3 x 2 matrix.

(a is the learning rate and is between 0 and 1 exclusive. A high learning rate (more than 0.5) may mean the network will never find a stable set of weights. A low learning rate may mean that it the training takes longer. In this Arduino version I've found that 0.1 is a good setting.)

The deltaW matrix is what we'll add to the original matrix to modify its weights.

HiddenToOutputMatrix, also obviously a 3 x 2. And we can follow a similar reasoning to the InputToHiddenMatrix, actually identical apart from the number of rows and columns. So shouldn't we put that inside a single function which can be called twice?

Look at the original diagram above, we can split it into two halves, and you can see that the architecture is the same, and only the sizes are different:
  First layer, input to hidden, 4 inputs and 3 outputs


Second layer, hidden to output, 3 inputs and 2 outputs


You can see that there are two layers.

As I said, this means is that it makes sense to have a single back propagation function which can be called twice instead of writing it all out twice. 

Note that in the Python implementation by Tariq Rashid the update of the weights by back propagation was done with a single "line" per layer. Here is one of those "lines":

self.who += self.lrate * 
            numpy.dot((output_errors*final_outputs*(1-final_outputs)),
            numpy.transpose(hidden_outputs))

who = weights-hidden-to-output. Why is there a "dot" in the above function? Well actually in Python this is the outer product. With the outer product two vectors produce a matrix, and in our case produces the matrix of changes to apply to the old matrix. For example:



It is always good to keep your feet on the ground when using matrices for neural networks, otherwise, if you're like me, you'll soon lose clarity about what the rows and columns are. 

In this case the number of rows in a matrix in a neural network layer are the number of inputs to the layer, and the number of columns in the matrix is the number of outputs. In images:


In the diagram above you can see a column vector and a row vector multiplied together to form a matrix. You do this using the outer product. Here it is in Arduino C:


// Outer product, from two vectors form a matrix
// C = A*B
// A is a column vector (vertical and has lots of rows)
// B is a row vector (horizontal and has lots of columns)
// C must have space for mRows and nColumns for this to work.
void OuterProduct(mtx_type* A, mtx_type* B, int mRows, int nColumns, mtx_type* C)
{
    int ra, cb;
    for (ra = 0; ra < mRows; ra++) {
        for(cb = 0; cb < nColumns; cb++)
        {
            // C[ra][cb] = C[ra][cb] + (A[ra] * B[cb]);
            C[(nColumns * ra) + cb] = A[ra] * B[cb];
        }
    }
}
 

(The Arduino language is a sort of reduced C/C++ by the way, as far as I can understand. On the other hand the Arduino is so low-cost you can imagine putting Arduino neural networks to work anywhere.)

And here's Part 3: Details and tests of the matrix multiplication.


No comments:

Post a Comment