204.4.9 Model-Bias Variance Tradeoff

Link to the previous post : https://statinfer.com/204-4-8-problem-of-under-fitting/

Over fitting
- Low Bias with High Variance
- Low training error – ‘Low Bias’
- High testing error
- Unstable model – ‘High Variance’
- The coefficients of the model change with small changes in the data
Under fitting
- High Bias with low Variance
- High training error – ‘high Bias’
- testing error almost equal to training error
- Stable model – ‘Low Variance’
- The coefficients of the model doesn’t change with small changes in the data

$Y = f (X) + ϵ$ $V a r (ϵ) = σ^{2}$ $S q u a r e d E r r o r = E [(Y - \hat{f} (x_{0}))^{2} | X = x_{0}]$

= $σ^{2} + [E \hat{f} (x_{0}) - f (x_{0})]^{2} + E [\hat{f} (x_{0}) - E \hat{f} (x_{0})]^{2}$

= $σ^{2} + (B i a s)^{2} (\hat{f} (x_{0})) + V a r (\hat{f} (x_{0}))$

Overall Model Squared Error = Irreducible Error + $Bias^2$ + Variance

Overall error is made by bias and variance together.
High bias low variance, Low bias and high variance, both are bad for the overall accuracy of the model.
A good model need to have low bias and low variance or at least an optimal where both of them are jointly low.
How to choose such optimal model. How to choose that optimal model complexity.

Unfortunately,there is no scientific method of choosing optimal model complexity that gives minimum test error.
Training error is not a good estimate of the test error.
There is always bias-variance tradeoff in choosing the appropriate complexity of the model.
We can use cross validation methods, boot strapping and bagging to choose the optimal and consistent model.

The next post is about cross validation.

Link to the next post : https://statinfer.com/204-4-10-cross-validation/

22nd June 2017