• No products in the cart.

204.4.6 Type of Datasets, Type of Errors and Problem of Overfitting

Things to know before proceeding further.
Link to the previous post : https://statinfer.com/204-4-5-what-is-a-best-model/

Different Type of Datasets and Errors

The Training Error

  • The accuracy of our best model is 95%. Is the 5% error model really good?
  • The error on the training data is known as training error.
  • A low error rate on training data may not always mean the model is good.
  • What really matters is how the model is going to perform on unknown data or test data.
  • We need to find out a way to get an idea on error rate of test data.
  • We may have to keep aside a part of the data and use it for validation.
  • There are two types of datasets and two types of errors.

Two Types of Datasets

  • There are two types of datasets.
  • Training set: This is used in model building. The input data.
  • Test set: The unknown dataset. This dataset is gives the accuracy of the final model.
  • We may not have access to these two datasets for all machine learning problems. In some cases, we can take 90% of the available data and use it as training data and rest 10% can be treated as validation data.
  • Validation set: This dataset kept aside for model validation and selection. This is a temporary subsite to test dataset. It is not third type of data.
  • We create the validation data with the hope that the error rate on validation data will give us some basic idea on the test error.

Types of Errors

  • The training error
  • The error on training dataset
  • In-time error
  • Error on the known data
  • Can be reduced while building the model
  • The test error
  • The error that matters
  • Out-of-time error
  • The error on unknown/new dataset.

“A good model will have both training and test error very near to each other and close to zero”

Yes, this post is quite small but we will need to know Type of Errors and Type of Dataset to validate a model.

The next post is about problem of overfitting.

Link to the next post : https://statinfer.com/204-4-7-problem-of-overfitting/

Statinfer

Statinfer derived from Statistical inference. We provide training in various Data Analytics and Data Science courses and assist candidates in securing placements.

Contact Us

info@statinfer.com

+91- 9676098897

+91- 9494762485

 

Our Social Links

top
© 2020. All Rights Reserved.