• No products in the cart.

# 204.4.8 Problem of Under-fitting

##### What happens if the model is Under-fitted? Huge Bias?
`Link to the previous post: https://statinfer.com/204-4-7-problem-of-overfitting/`

### The Problem of Under-fitting

• Simple models are better. It’s true but is that always true? May not be always true.
• We might have given it up too early. Did we really capture all the information?
• Did we do enough research and future re-engineering to fit the best model? Is it the best model that can be fit on this data?
• By being over cautious about variance in the parameters, we might miss out on some patterns in the data.
• Model need to be complicated enough to capture all the information present.
• If the training error itself is high, how can we be so sure about the model performance on unknown data?
• Most of the accuracy and error measuring statistics give us a clear idea on training error, this is one advantage of under fitting, we can identify it confidently.
• Under fitting
• A model that is too simple
• A mode with a scope for improvement
• A model with lot of bias

### Practice : Model with huge Bias

• Lets simplify the model.
• Take the high variance model and prune it.
• Make it as simple as possible.
• Find the training error and validation error.

Solution

In :
```#We can prune the tree by changing the parameters
tree_bias = tree.DecisionTreeClassifier(criterion='gini',
splitter='best',
max_depth=10,
min_samples_split=30,
min_samples_leaf=30,
max_leaf_nodes=20)
tree_bias.fit(X_train,y_train)
```
Out:
```DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=10,
max_features=None, max_leaf_nodes=20, min_samples_leaf=30,
min_samples_split=30, min_weight_fraction_leaf=0.0,
presort=False, random_state=None, splitter='best')```
In :
```#Training accuracy
tree_bias.score(X_train,y_train)
```
Out:
`0.85344444444444445`
In :
```#Lets prune the tree further.  Lets oversimplyfy the model
tree_bias1 = tree.DecisionTreeClassifier(criterion='gini',
splitter='random',
max_depth=1,
min_samples_split=100,
min_samples_leaf=100,
max_leaf_nodes=2)
tree_bias1.fit(X_train,y_train)
```
Out:
```DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=1,
max_features=None, max_leaf_nodes=2, min_samples_leaf=100,
min_samples_split=100, min_weight_fraction_leaf=0.0,
presort=False, random_state=None, splitter='random')```
In :
```#Training Accuracy of new model
tree_bias1.score(X_train,y_train)
```
Out:
`0.68231111111111109`
In :
```#Validation accuracy on test data
tree_bias1.score(X_test,y_test)
```
Out:

0.68910000000000005

In next post we will discuss how to choose optimal model using Bias Variance Trade off.