Principles of training multi-layer neural network using backpropagation,How to Understand Backpropagation?Especially how to update the weights?

The project describes teaching process of multi-layer neural network employing backpropagation algorithm. To illustrate this process the three layer neural network with two inputs and one output,which is shown in the picture below, is used:

Each neuron is composed of two units. First unit adds products of weights coefficients and input signals. The second unit realise nonlinear function, called neuron activation function. Signal e is adder output signal, and y = f(e) is output signal of nonlinear element. Signal y is also output signal of neuron.

The Forward

To teach the neural network we need training data set. The training data set consists of input signals (x1 and x2 ) assigned with corresponding target (desired output) z. The network training is an iterative process. In each iteration weights coefficients of nodes are modified using new data from training data set. Modification is calculated using algorithm described below: Each teaching step starts with forcing both input signals from training set. After this stage we can determine output signals values for each neuron in each network layer. Pictures below illustrate how signal is propagating through the network, Symbols w(xm)n represent weights of connections between network input xm and neuron n in input layer. Symbols yn represents output signal of neuron n.

Propagation of signals through the hidden layer. Symbols wmn represent weights of connections between output of neuron m and input of neuron n in the next layer.

Propagation of signals through the output layer.

In the next algorithm step the output signal of the network y is compared with the desired output value (the target), which is found in training data set. The difference is called error signal $\delta$ of output layer neuron.

The Backward

How the $\delta$ transfer backward

$$ \begin{align*} &\frac{\partial E_{o1}}{\partial out_{o1}} = -(z_1 - out_{o1}) = \delta \\ &\frac{\partial E_{o1}}{\partial w_5} = \frac{\partial E_{o1}}{\partial out_{o1}} \times \frac{\partial out_{o1}}{\partial net_{o1}} \times \frac{\partial net_{o1}}{\partial w_5} \\ &\frac{\partial E_{o1}}{\partial out_{o1}} = -(z_1 - out_{o1}) \\ &\frac{\partial out_{o1}}{\partial net_{o1}} = out_{o1}(1 - out_{o1}) \\ &\frac{\partial net_{o1}}{\partial w_5} = out_{h2} \\ &\frac{\partial E_{o1}}{\partial w_5} = \frac{\partial E_{o1}}{\partial out_{o1}} \times \frac{\partial out_{o1}}{\partial net_{o1}} \times \frac{\partial net_{o1}}{\partial w_5} = -(z_1 - out_{o1}) \times \frac{\partial out_{o1}}{\partial net_{o1}} \times out_{h2} \\ &\frac{\partial E_{o1}}{\partial w_4} = \frac{\partial E_{o1}}{\partial out_{o1}} \times \frac{\partial out_{o1}}{\partial net_{o1}} \times \frac{\partial net_{o1}}{\partial out_{h2}} \times \frac{\partial out_{h2}}{\partial net_{h2}} \times \frac{\partial net_{h2}}{\partial w_4} = -(z_1 - out_{o1}) \times \frac{\partial out_{o1}}{\partial net_{o1}} \times w_5 \times \frac{\partial out_{h2}}{\partial net_{h2}} \times out_{h1} \\ &\frac{\partial E_{o1}}{\partial w_3} = \frac{\partial E_{o1}}{\partial out_{o1}} \times \frac{\partial out_{o1}}{\partial net_{o1}} \times \frac{\partial net_{o1}}{\partial out_{h2}} \times \frac{\partial out_{h2}}{\partial net_{h2}} \times \frac{\partial net_{h2}}{\partial out_{h1}} \times \frac{\partial out_{h1}}{\partial net_{h1}} \times \frac{\partial net_{h1}}{\partial w_3} \\ &\phantom{\frac{\partial E_{o1}}{\partial w_3}} = -(z_1 - out_{o1}) \times \frac{\partial out_{o1}}{\partial net_{o1}} \times w_5 \times \frac{\partial out_{h2}}{\partial net_{h2}} \times w_4 \times \frac{\partial out_{h1}}{\partial net_{h1}} \times x_1 \\ &w_{5new} = w_{5old} - \varphi \frac{\partial E_{o1}}{\partial w_5} = w_{5old} - \varphi (\delta \times \frac{\partial out_{o1}}{\partial net_{o1}}) \times out_{h2} \\ &w_{4new} = w_{4old} - \varphi \frac{\partial E_{o1}}{\partial w_4} = w_{4old} - \varphi (\delta \times \frac{\partial out_{o1}}{\partial net_{o1}}) \times (w_{5old} \times \frac{\partial out_{h2}}{\partial net_{h2}}) \times out_{h1} \\ &w_{3new} = w_{3old} - \varphi \frac{\partial E_{o1}}{\partial w_3} = w_{3old} - \varphi (\delta \times \frac{\partial out_{o1}}{\partial net_{o1}}) \times (w_{5old} \times \frac{\partial out_{h2}}{\partial net_{h2}}) \times (w_{4old} \times \frac{\partial out_{h1}}{\partial net_{h1}}) \times x_1 \end{align*} $$

Example

A Code Demo

two input,two hidden layer and two output layer,the active function is sigmoid.

this demo show how to auto change the weight to make inputs:[0.05,0.10] calculate with the weights to match the outputs:[0.01,0.99]

the code:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235


#coding:utf-8
import random
import math
import matplotlib.pyplot as plt


class NeuralNetwork:
    LEARNING_RATE = 0.5

    def __init__(self, num_inputs, num_hidden, num_outputs, hidden_layer_weights = None, hidden_layer_bias = None, output_layer_weights = None, output_layer_bias = None):
        self.num_inputs = num_inputs

        self.hidden_layer = NeuronLayer(num_hidden, hidden_layer_bias)
        self.output_layer = NeuronLayer(num_outputs, output_layer_bias)

        self.init_weights_from_inputs_to_hidden_layer_neurons(hidden_layer_weights)
        self.init_weights_from_hidden_layer_neurons_to_output_layer_neurons(output_layer_weights)

    def init_weights_from_inputs_to_hidden_layer_neurons(self, hidden_layer_weights):
        weight_num = 0
        for h in range(len(self.hidden_layer.neurons)):
            for i in range(self.num_inputs):
                if not hidden_layer_weights:
                    self.hidden_layer.neurons[h].weights.append(random.random())
                else:
                    self.hidden_layer.neurons[h].weights.append(hidden_layer_weights[weight_num])
                weight_num += 1

    def init_weights_from_hidden_layer_neurons_to_output_layer_neurons(self, output_layer_weights):
        weight_num = 0
        for o in range(len(self.output_layer.neurons)):
            for h in range(len(self.hidden_layer.neurons)):
                if not output_layer_weights:
                    self.output_layer.neurons[o].weights.append(random.random())
                else:
                    self.output_layer.neurons[o].weights.append(output_layer_weights[weight_num])
                weight_num += 1

    def inspect(self):
        print('------')
        print('* Inputs: {}'.format(self.num_inputs))
        print('------')
        print('Hidden Layer')
        self.hidden_layer.inspect()
        print('------')
        print('* Output Layer')
        self.output_layer.inspect()
        print('------')

    def feed_forward(self, inputs):
        hidden_layer_outputs = self.hidden_layer.feed_forward(inputs)
        return self.output_layer.feed_forward(hidden_layer_outputs)

    def train(self, training_inputs, training_outputs):
        self.inspect()
        self.feed_forward(training_inputs)

        # 1. out di*df/de
        pd_errors_wrt_output_neuron_total_net_input = [0] * len(self.output_layer.neurons)
        for o in range(len(self.output_layer.neurons)):

            # ∂E/∂zⱼ
            pd_errors_wrt_output_neuron_total_net_input[o] = self.output_layer.neurons[o].calculate_pd_error_wrt_total_net_input(training_outputs[o])
        print("out di*df/de------------------------------------------------------------------------------------------>",pd_errors_wrt_output_neuron_total_net_input)
        # 2. hidden di*df/de
        pd_errors_wrt_hidden_neuron_total_net_input = [0] * len(self.hidden_layer.neurons)
        for h in range(len(self.hidden_layer.neurons)):

            # dE/dyⱼ = Σ ∂E/∂zⱼ * ∂z/∂yⱼ = Σ ∂E/∂zⱼ * wᵢⱼ
            d_error_wrt_hidden_neuron_output = 0
            for o in range(len(self.output_layer.neurons)):
                print("calc pd_active*pd_out*weight,",pd_errors_wrt_output_neuron_total_net_input[o], "*", self.output_layer.neurons[o].weights[h],"+", d_error_wrt_hidden_neuron_output)
                d_error_wrt_hidden_neuron_output += pd_errors_wrt_output_neuron_total_net_input[o] * self.output_layer.neurons[o].weights[h]
                print("Σpd_active*pd_out*weight:",d_error_wrt_hidden_neuron_output)
            print("*********************************")
            # ∂E/∂zⱼ = dE/dyⱼ * ∂zⱼ/∂
            delta_active = self.hidden_layer.neurons[h].calculate_pd_total_net_input_wrt_input()
            pd_errors_wrt_hidden_neuron_total_net_input[h] = d_error_wrt_hidden_neuron_output * delta_active
            print("pd_errors_wrt_hidden_neuron_total_net_input[", h, "]=", d_error_wrt_hidden_neuron_output, "*",
                  delta_active, "=", pd_errors_wrt_hidden_neuron_total_net_input[h])
            print("=================================")
        print("hidden di*df/de------------------------------------------------------------------------------------------>",pd_errors_wrt_hidden_neuron_total_net_input)
        # 3. update out weights
        for o in range(len(self.output_layer.neurons)):
            for w_ho in range(len(self.output_layer.neurons[o].weights)):

                # ∂Eⱼ/∂wᵢⱼ = ∂E/∂zⱼ * ∂zⱼ/∂wᵢⱼ
                print("calc pd_active*pd_out:", pd_errors_wrt_output_neuron_total_net_input[o] )
                output_layer_input = self.output_layer.neurons[o].calculate_pd_total_net_input_wrt_weight(w_ho)
                pd_error_wrt_weight = pd_errors_wrt_output_neuron_total_net_input[o] * output_layer_input
                print("calc pd_active*pd_out*input:", pd_errors_wrt_output_neuron_total_net_input[o], "*", output_layer_input, "=", pd_error_wrt_weight)
                # Δw = α * ∂Eⱼ/∂wᵢ
                print("update out weights:", self.output_layer.neurons[o].weights[w_ho], "-=", self.LEARNING_RATE, "*",
                      pd_error_wrt_weight)
                self.output_layer.neurons[o].weights[w_ho] -= self.LEARNING_RATE * pd_error_wrt_weight
                print("-----------------------------------------------------> =",
                      self.output_layer.neurons[o].weights[w_ho])
        print("after update output weights")
        self.inspect()
        # 4. update hidden weights
        for h in range(len(self.hidden_layer.neurons)):
            for w_ih in range(len(self.hidden_layer.neurons[h].weights)):

                # ∂Eⱼ/∂wᵢ = ∂E/∂zⱼ * ∂zⱼ/∂wᵢ
                hidden_layer_input = self.hidden_layer.neurons[h].calculate_pd_total_net_input_wrt_weight(w_ih)
                pd_error_wrt_weight = pd_errors_wrt_hidden_neuron_total_net_input[h] * hidden_layer_input
                print("calc pd_active*pd_hidden*input:", pd_errors_wrt_hidden_neuron_total_net_input[h], "*",
                      hidden_layer_input, "=", pd_error_wrt_weight)
                # Δw = α * ∂Eⱼ/∂wᵢ
                print("update hidden weights:", self.hidden_layer.neurons[h].weights[w_ih], "-=", self.LEARNING_RATE, "*",
                      pd_error_wrt_weight)
                self.hidden_layer.neurons[h].weights[w_ih] -= self.LEARNING_RATE * pd_error_wrt_weight
                print("-----------------------------------------------------> =",
                      self.hidden_layer.neurons[h].weights[w_ih])
        print("after update hidden weights")
        self.inspect()
    def calculate_total_error(self, training_sets):
        total_error = 0
        for t in range(len(training_sets)):
            training_inputs, training_outputs = training_sets[t]
            self.feed_forward(training_inputs)
            for o in range(len(training_outputs)):
                total_error += self.output_layer.neurons[o].calculate_error(training_outputs[o])
        return total_error

class NeuronLayer:
    def __init__(self, num_neurons, bias):

        # share the bias in the same layer
        self.bias = bias if bias else random.random()

        self.neurons = []
        for i in range(num_neurons):
            self.neurons.append(Neuron(self.bias))

    def inspect(self):
        print('Neurons:', len(self.neurons))
        for n in range(len(self.neurons)):
            print(' Neuron', n)
            for w in range(len(self.neurons[n].weights)):
                print('  Weight:', self.neurons[n].weights[w])
            print('  Bias:', self.bias)

    def feed_forward(self, inputs):
        outputs = []
        for neuron in self.neurons:
            outputs.append(neuron.calculate_output(inputs))
        return outputs

    def get_outputs(self):
        outputs = []
        for neuron in self.neurons:
            outputs.append(neuron.output)
        return outputs

class Neuron:
    def __init__(self, bias):
        self.bias = bias
        self.weights = []

    def calculate_output(self, inputs):
        self.inputs = inputs
        self.output = self.squash(self.calculate_total_net_input())
        return self.output

    def calculate_total_net_input(self):
        total = 0
        for i in range(len(self.inputs)):
            total += self.inputs[i] * self.weights[i]
        return total + self.bias

    # active fun:sigmoid
    def squash(self, total_net_input):
        return 1 / (1 + math.exp(-total_net_input))


    def calculate_pd_error_wrt_total_net_input(self, target_output):
        print("calc pd_active*pd_out")
        a = self.calculate_pd_error_wrt_output(target_output)
        b = self.calculate_pd_total_net_input_wrt_input()
        result = a*b
        print("pd_active*pd_out=", a, "*",b, "=",result)
        return result

    # calc error
    def calculate_error(self, target_output):
        return 0.5 * (target_output - self.output) ** 2


    def calculate_pd_error_wrt_output(self, target_output):
        print("-(target_output - self.output) : -(", target_output,"-", self.output,")=",-(target_output - self.output))
        return -(target_output - self.output)


    def calculate_pd_total_net_input_wrt_input(self):
        print("self.output * (1 - self.output) : ", self.output,"*", (1 - self.output),"=",self.output * (1 - self.output))
        return self.output * (1 - self.output)


    def calculate_pd_total_net_input_wrt_weight(self, index):
        print("self.inputs[index]:", self.inputs[index])
        return self.inputs[index]


# example as the picture
nn = NeuralNetwork(2, 2, 2, hidden_layer_weights=[0.15, 0.2, 0.25, 0.3], hidden_layer_bias=0.35, output_layer_weights=[0.4, 0.45, 0.5, 0.55], output_layer_bias=0.6)
xs = []
ys = []
for i in range(1): # you can change to range(10000),1 is just for debug
    nn.train([0.05, 0.1], [0.01, 0.09])
    loss = round(nn.calculate_total_error([[[0.05, 0.1], [0.01, 0.09]]]), 9)
    xs.append(i)
    ys.append(loss)
    print(i, loss)
# bars = plt.bar(xs, ys)
plots = plt.plot(xs, ys)
plt.title('loss')
plt.xlabel('X axis - loops')
plt.ylabel('Y axis - errors')
plt.show()

# another example

# training_sets = [
#     [[0, 0], [0]],
#     [[0, 1], [1]],
#     [[1, 0], [1]],
#     [[1, 1], [0]]
# ]
#
# nn = NeuralNetwork(len(training_sets[0][0]), 5, len(training_sets[0][1]))
# for i in range(10000):
#     training_inputs, training_outputs = random.choice(training_sets)
#     nn.train(training_inputs, training_outputs)
#     print(i, nn.calculate_total_error(training_sets))

run output,just loop one time:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143


D:\software\anaconda3\envs\mini_backpropagation\python.exe D:\project\mini_backpropagation\NeuralNetwork.py 
------
* Inputs: 2
------
Hidden Layer
Neurons: 2
 Neuron 0
  Weight: 0.15
  Weight: 0.2
  Bias: 0.35
 Neuron 1
  Weight: 0.25
  Weight: 0.3
  Bias: 0.35
------
* Output Layer
Neurons: 2
 Neuron 0
  Weight: 0.4
  Weight: 0.45
  Bias: 0.6
 Neuron 1
  Weight: 0.5
  Weight: 0.55
  Bias: 0.6
------
calc pd_active*pd_out
-(target_output - self.output) : -( 0.01 - 0.7513650695523157 )= 0.7413650695523157
self.output * (1 - self.output) :  0.7513650695523157 * 0.24863493044768425 = 0.18681560180895948
pd_active*pd_out= 0.7413650695523157 * 0.18681560180895948 = 0.13849856162855698
calc pd_active*pd_out
-(target_output - self.output) : -( 0.09 - 0.7729284653214625 )= 0.6829284653214626
self.output * (1 - self.output) :  0.7729284653214625 * 0.22707153467853747 = 0.17551005281727122
pd_active*pd_out= 0.6829284653214626 * 0.17551005281727122 = 0.11986081101898788
out di*df/de------------------------------------------------------------------------------------------> [0.13849856162855698, 0.11986081101898788]
calc pd_active*pd_out*weight, 0.13849856162855698 * 0.4 + 0
Σpd_active*pd_out*weight: 0.05539942465142279
calc pd_active*pd_out*weight, 0.11986081101898788 * 0.5 + 0.05539942465142279
Σpd_active*pd_out*weight: 0.11532983016091673
*********************************
self.output * (1 - self.output) :  0.5932699921071872 * 0.4067300078928128 = 0.24130070857232525
pd_errors_wrt_hidden_neuron_total_net_input[ 0 ]= 0.11532983016091673 * 0.24130070857232525 = 0.027829169737355136
=================================
calc pd_active*pd_out*weight, 0.13849856162855698 * 0.45 + 0
Σpd_active*pd_out*weight: 0.06232435273285064
calc pd_active*pd_out*weight, 0.11986081101898788 * 0.55 + 0.06232435273285064
Σpd_active*pd_out*weight: 0.12824779879329398
*********************************
self.output * (1 - self.output) :  0.596884378259767 * 0.40311562174023297 = 0.2406134172492184
pd_errors_wrt_hidden_neuron_total_net_input[ 1 ]= 0.12824779879329398 * 0.2406134172492184 = 0.03085814112234465
=================================
hidden di*df/de------------------------------------------------------------------------------------------> [0.027829169737355136, 0.03085814112234465]
calc pd_active*pd_out: 0.13849856162855698
self.inputs[index]: 0.5932699921071872
calc pd_active*pd_out*input: 0.13849856162855698 * 0.5932699921071872 = 0.08216704056423078
update out weights: 0.4 -= 0.5 * 0.08216704056423078
-----------------------------------------------------> = 0.35891647971788465
calc pd_active*pd_out: 0.13849856162855698
self.inputs[index]: 0.596884378259767
calc pd_active*pd_out*input: 0.13849856162855698 * 0.596884378259767 = 0.08266762784753326
update out weights: 0.45 -= 0.5 * 0.08266762784753326
-----------------------------------------------------> = 0.4086661860762334
calc pd_active*pd_out: 0.11986081101898788
self.inputs[index]: 0.5932699921071872
calc pd_active*pd_out*input: 0.11986081101898788 * 0.5932699921071872 = 0.071109822407196
update out weights: 0.5 -= 0.5 * 0.071109822407196
-----------------------------------------------------> = 0.464445088796402
calc pd_active*pd_out: 0.11986081101898788
self.inputs[index]: 0.596884378259767
calc pd_active*pd_out*input: 0.11986081101898788 * 0.596884378259767 = 0.07154304566278001
update out weights: 0.55 -= 0.5 * 0.07154304566278001
-----------------------------------------------------> = 0.5142284771686101
after update output weights
------
* Inputs: 2
------
Hidden Layer
Neurons: 2
 Neuron 0
  Weight: 0.15
  Weight: 0.2
  Bias: 0.35
 Neuron 1
  Weight: 0.25
  Weight: 0.3
  Bias: 0.35
------
* Output Layer
Neurons: 2
 Neuron 0
  Weight: 0.35891647971788465
  Weight: 0.4086661860762334
  Bias: 0.6
 Neuron 1
  Weight: 0.464445088796402
  Weight: 0.5142284771686101
  Bias: 0.6
------
self.inputs[index]: 0.05
calc pd_active*pd_hidden*input: 0.027829169737355136 * 0.05 = 0.001391458486867757
update hidden weights: 0.15 -= 0.5 * 0.001391458486867757
-----------------------------------------------------> = 0.1493042707565661
self.inputs[index]: 0.1
calc pd_active*pd_hidden*input: 0.027829169737355136 * 0.1 = 0.002782916973735514
update hidden weights: 0.2 -= 0.5 * 0.002782916973735514
-----------------------------------------------------> = 0.19860854151313226
self.inputs[index]: 0.05
calc pd_active*pd_hidden*input: 0.03085814112234465 * 0.05 = 0.0015429070561172327
update hidden weights: 0.25 -= 0.5 * 0.0015429070561172327
-----------------------------------------------------> = 0.24922854647194137
self.inputs[index]: 0.1
calc pd_active*pd_hidden*input: 0.03085814112234465 * 0.1 = 0.0030858141122344653
update hidden weights: 0.3 -= 0.5 * 0.0030858141122344653
-----------------------------------------------------> = 0.29845709294388273
after update hidden weights
------
* Inputs: 2
------
Hidden Layer
Neurons: 2
 Neuron 0
  Weight: 0.1493042707565661
  Weight: 0.19860854151313226
  Bias: 0.35
 Neuron 1
  Weight: 0.24922854647194137
  Weight: 0.29845709294388273
  Bias: 0.35
------
* Output Layer
Neurons: 2
 Neuron 0
  Weight: 0.35891647971788465
  Weight: 0.4086661860762334
  Bias: 0.6
 Neuron 1
  Weight: 0.464445088796402
  Weight: 0.5142284771686101
  Bias: 0.6
------
0 0.496045683

Process finished with exit code 0

after 1000 loops,the loss likes this,the loss is 0.000586103:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26


------
* Inputs: 2
------
Hidden Layer
Neurons: 2
 Neuron 0
  Weight: 0.2964604103620042
  Weight: 0.49292082072400834
  Bias: 0.35
 Neuron 1
  Weight: 0.39084333156627366
  Weight: 0.5816866631325477
  Bias: 0.35
------
* Output Layer
Neurons: 2
 Neuron 0
  Weight: -3.060957226462873
  Weight: -3.0308626603447846
  Bias: 0.6
 Neuron 1
  Weight: -2.393475400842236
  Weight: -2.3602088337272704
  Bias: 0.6
------
999 0.000586103

How to Understand Backpropagation?Especially How to Update Weights?

The Forward

The Backward

How the $\delta$ transfer backward

Example

A Code Demo