204.5.14 Neural Network Appendix

Link to the previous post :


In this post we will discuss the math behind a few steps of Neural Network algorithms.

Math- How to update the weights?

  • We update the weights backwards by iteratively calculating the error.
  • The formula for weights updating is done using gradient descent method or delta rule also known as Widrow-Hoff rule.
  • First, we calculate the weight corrections for the output layer then we take care of hidden layers.
  • \(W_(jk) = W_(jk) + \Delta W_(jk)\)
    • where \(\Delta W_(jk) = \eta . y_j \delta_k\)
    • \(\eta\) is the learning parameter
    • \(\delta_k = y_k (1- y_k) * Err\) (for hidden layers \(\delta_k = y_k (1- y_k) * w_j * Err )\)
    • Err = Expected output-Actual output
  • The weight corrections is calculated based on the error function.
  • The new weights are chosen in such way that the final error in that network is minimized.

Math-How does the delta rule work?

  • Lets, consider a simple example to understand the weight updating using delta rule.
  • If we building a simple logistic regression line. We would like to find the weights using weight update rule.
  • \(Y= \frac{1}{(1+e^(-wx))}\) is the equation.
  • We are searching for the optimal w for our data
  • Let w be 1
  • \(Y=\frac{1}{(1+e^(-x))}\) is the initial equation
  • The error in our initial step is 3.59
  • To reduce the error we will add a delta to w and make it 1.5
  • Now w is 1.5 (blue line)
  • \(Y=\frac{1}{(1+e^(-1.5x))}\) the updated equation
  • With the updated weight, the error is 1.57
  • We can further reduce the error by increasing w by delta
  • If we repeat the same process of adding delta and updating weights, we can finally end up with minimum error.
  • The weight at that final step is the optimal weight.
  • In this example the weight is 8, and the error is 0.
  • \(Y=\frac{1}{(1+e^(-8x))}\) is the final equation.
  • In this example, we manually changed the weights to reduce the error. This is just for intuition, manual updating is not feasible for complex optimization problems.
  • In gradient descent is a scientific optimization method. We update the weights by calculating gradient of the function.

How does gradient descent work?

  • Gradient descent is one of the famous ways to calculate the local minimum.
  • By Changing the weights we are moving towards the minimum value of the error function. The weights are changed by taking steps in the negative direction of the function gradient(derivative).

Does this method really work?

  • We changed the weights did it reduce the overall error?
  • Lets, calculate the error with new weights and see the change.

Gradient Descent Method Validation

  • With our initial set of weights the overall error was 0.7137,Y Actual is 0, Y Predicted is 0.7137 error =0.7137
  • The new weights give us a predicted value of 0.70655
  • In one iteration, we reduced the error from 0.7137 to 0.70655
  • The error is reduced by 1%. Repeat the same process with multiple epochs and training examples, we can reduce the error further.

References & Image Sources

0 responses on "204.5.14 Neural Network Appendix"

Leave a Message

Blog Posts

Hurry up!!!

"use coupon code for FLAT 30% discount"  datascientistoffer        ___________________________________      Subscribe to our youtube channel. Get access to video tutorials.                

Contact Us

Statinfer Software Solutions#647 2nd floor 1st Main, Indira Nagar 1st Stage, 100 feet road,Indranagar Bangalore,Karnataka, Pin code:-560038 Landmarks: Opp. Namma Metro Pillar 48.

Connect with us

linkin fn twitter g

How to become a Data Scientist.?