What Is Backpropagation?
The backpropagation algorithm looks for the minimum value of the error function in weight space using a technique called the delta rule or gradient descent.
The weights that minimize the error function is then considered to be a solution to the learning problem.
Let’s understand how it works with an example.
You have a dataset, which has labels as follows:
Input |
Desired Output |
0 |
0 |
1 |
2 |
2 |
4 |
|
⇒
Now the output (=input×weight) of your model when ‘W’ value is 3:
|
Input |
Desired Output |
Model Output (W=3) |
0 |
0 |
0 |
1 |
2 |
3 |
2 |
4 |
6 |
|
Notice the difference between the actual output and the desired output:
Input |
Desired Output |
Model Output (W=3) |
Absolute Error |
Square Error |
0 |
0 |
0 |
0 |
0 |
1 |
2 |
3 |
1 |
1 |
2 |
4 |
6 |
2 |
4 |
Let’s change the value of ‘W’. Notice the error when ‘W’ = ‘4’:
Input |
Desired Output |
Model Output (W=3) |
Absolute Error |
Square Error |
Model Output (W=4) |
Square Error |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
2 |
3 |
1 |
1 |
4 |
4 |
2 |
4 |
6 |
2 |
4 |
8 |
16 |
Now if you notice, when we increase the value of ‘W’ the error has increased.
So, obviously there is no point in increasing the value of ‘W’ further.
But, what happens if I decrease the value of ‘W’?
Input |
Desired Output |
Model Output (W=3) |
Absolute Error |
Square Error |
Model Output (W=2) |
Square Error |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
2 |
3 |
1 |
1 |
2 |
0 |
2 |
4 |
6 |
2 |
4 |
4 |
0 |