• No products in the cart.

# 204.5.14 Neural Network Appendix

Link to the previous post : https://statinfer.com/204-5-13-neural-networks-conclusion/

In this post we will discuss the math behind a few steps of Neural Network algorithms.

#### Math- How to update the weights? • We update the weights backwards by iteratively calculating the error.
• The formula for weights updating is done using gradient descent method or delta rule also known as Widrow-Hoff rule.
• First, we calculate the weight corrections for the output layer then we take care of hidden layers.
• $$W_(jk) = W_(jk) + \Delta W_(jk)$$
• where $$\Delta W_(jk) = \eta . y_j \delta_k$$
• $$\eta$$ is the learning parameter
• $$\delta_k = y_k (1- y_k) * Err$$ (for hidden layers $$\delta_k = y_k (1- y_k) * w_j * Err )$$
• Err = Expected output-Actual output
• The weight corrections is calculated based on the error function.
• The new weights are chosen in such way that the final error in that network is minimized.

#### Math-How does the delta rule work?

• Lets, consider a simple example to understand the weight updating using delta rule. • If we building a simple logistic regression line. We would like to find the weights using weight update rule.
• $$Y= \frac{1}{(1+e^(-wx))}$$ is the equation.
• We are searching for the optimal w for our data • Let w be 1
• $$Y=\frac{1}{(1+e^(-x))}$$ is the initial equation
• The error in our initial step is 3.59
• To reduce the error we will add a delta to w and make it 1.5 • Now w is 1.5 (blue line)
• $$Y=\frac{1}{(1+e^(-1.5x))}$$ the updated equation
• With the updated weight, the error is 1.57
• We can further reduce the error by increasing w by delta • If we repeat the same process of adding delta and updating weights, we can finally end up with minimum error.
• The weight at that final step is the optimal weight.
• In this example the weight is 8, and the error is 0.
• $$Y=\frac{1}{(1+e^(-8x))}$$ is the final equation.
• In this example, we manually changed the weights to reduce the error. This is just for intuition, manual updating is not feasible for complex optimization problems.
• In gradient descent is a scientific optimization method. We update the weights by calculating gradient of the function.

### How does gradient descent work?

• Gradient descent is one of the famous ways to calculate the local minimum.
• By Changing the weights we are moving towards the minimum value of the error function. The weights are changed by taking steps in the negative direction of the function gradient(derivative). #### Does this method really work?

• We changed the weights did it reduce the overall error?
• Lets, calculate the error with new weights and see the change. • With our initial set of weights the overall error was 0.7137,Y Actual is 0, Y Predicted is 0.7137 error =0.7137
• The new weights give us a predicted value of 0.70655
• In one iteration, we reduced the error from 0.7137 to 0.70655
• The error is reduced by 1%. Repeat the same process with multiple epochs and training examples, we can reduce the error further. ### 0 responses on "204.5.14 Neural Network Appendix"

#### Statinfer

Statinfer derived from Statistical inference is a company that focuses on the data science training and R&D.We offer training on Machine Learning, Deep Learning and Artificial Intelligence using tools like R, Python and TensorFlow 