Ransen's Technical Stuff: Neural Networks on the Arduino Part 4: Calculation of hidden errors in a neural network

Of all the things I found hard to "believe" "understand" or "grok" was how the hidden errors are calculated in a neural network. If you've done matrix maths the equation seems far too simple to be true. Here I'll illustrate to you, and to myself, how it works.

The problem is that while it is clear what an error is at the output of a neural network, it is not immediately clear what an error is at the hidden layer output. Graphically:

It turns out that a good approach is to use the errors at the output to set the "errors" in the hidden layer.

(By the way you can't just copy the output errors into the hidden layer because there is no guarantee that the number of outputs is the same as the number of hidden nodes. Plus the fact that you'd be arbitrarily jumping over the HiddenToOutput matrix.)

So the idea is that the hidden layer errors are weighted averages of the errors at the output. We make the assumption that an error at the output has been caused by an "error" in the hidden layer. How much the error at the output has been caused by the error in the hidden depends on the weight connecting the two nodes (errors).

For example look at eH2 above, it contributed to errors at eO1 and eO2, and how much it contributed depended on the weights wa and wb. So eH2 will be a weighted average of eO1 and eO2. As explained in the book by Tariq Rashid, we do not need to use an actual weighted average, we can just use the weights directly.

In other words...

eH2 = wa*eO1 + wb*eO2

Luckily this can be nicely done with matrices, and the matrix of the weights for this back propagation of errors is easily created from the forward hidden to output matrix.

I'm about to demonstrate that, by the magic of matrices, the matrix which gives us the hidden "errors" from the output errors is simply the transpose of the forward hidden to output matrix! Lets look again at how the forward hidden to output matrix works:

I've arranged the weights closer to the output nodes so you can see the calculation better. Study and understand how o1 is calculated. Compare the drawing and the matrix versions in the above image.

So. That is how the forward calculation of the outputs work. How do we do the backward propagation of the errors using matrix multiplication? Here's another, this time illustrating the backward propagation of errors from the output to the hidden layer:

I've drawn the errors flowing rightwards, from eo1 to eh1 etc. This is so you can easily relate the diagram to the matrix multiplication.
The o in eo2 (for example) stands for output
The h in eh3 (for example) stands for hidden
I've drawn the weights which modify the output errors to form the hidden "errors" close to the hidden error nodes
I've drawn the hidden error calculations at top right of the image.
I've drawn the matrix multiplication version in the bottom half.
To be sure the weights have the correct indices try checking the weight which connects o1(eo1) to h2(eh2) in both diagrams. In both diagrams it is w21.
Note that the matrix in this diagram is the transpose of the matrix in the previous diagram.

E voila! That is why and how the transpose of the final forward matrix can be used to "invent" the hidden "errors" from the output errors.

Part 5 How To Update The Weights

Ransen's Technical Stuff

Sunday, March 29, 2020

Neural Networks on the Arduino Part 4: Calculation of hidden errors in a neural network

No comments:

Post a Comment