• No products in the cart.

204.4.12 Bootstrap Cross Validation

Link to the previous post : https://statinfer.com/204-4-11-k-fold-cross-validation/


This will be our last post on our Model Selection and Cross Validation Series.

Bootstrap Methods

  • Boot strapping is a powerful tool to get an idea on accuracy of the model and the test error.
  • Can estimate the likely future performance of a given modeling procedure, on new data not yet realized.
  • The Algorithm
    • We have a training data is of size N
    • Draw random sample with replacement of size N – This gives a new dataset, it might have repeated observations, some observations might not have even appeared once.
    • Create B such new datasets. These are called boot strap datasets.
    • Build the model on these B datasets, we can test the models on the original training dataset.

Bootstrap Example


  • We have a training data is of size 500
  • Boot Strap Data-1:
    • Create a dataset of size 500. To create this dataset, draw a random point, note it down, then replace it back. Again draw another sample point. Repeat this process 500 times. This makes a dataset of size 500. Call this as Boot Strap Data-1.
  • Multiple Boot Strap datasets
    • Repeat, the procedure in step -2 multiple times. Say 200 times. Then we have 200 Boot Strap datasets.
  • We can build the models on these 200 boost strap datasets and the average error gives a good idea on overall error. We can even use the original training data as the test data for each of the models.

LAB: Bootstrap Cross Validation

  • Draw a boot strap sample with sufficient sample size
  • Build a tree model and get an estimate on true accuracy of the model


In [42]:
# Defining the tree parameters
tree_BS = tree.DecisionTreeClassifier(criterion='gini', 
In [43]:
# Defining the bootstrap variable for 10 random samples
In [44]:
###checking the error in the Boot Strap models###
BS_score = cross_validation.cross_val_score(tree_BS,X, y,cv=bootstrap)
array([ 0.8658,  0.8699,  0.8658,  0.8655,  0.8694,  0.8741,  0.8689,
        0.8689,  0.8639,  0.8672])
In [45]:
#Expected accuracy according to bootstrap validation
###checking the error in the Boot Strap models###

With 10 bootstrap samples we can expect an Accuracy : 77.98%.


  • We studied
    • Validating a model, Types of data & Types of errors
    • The problem of over fitting & The problem of under fitting
    • Bias Variance Tradeoff
    • Cross validation & Boot strapping
  • Training error is what we see and that is not the true performance metric
  • Test error plays vital role in model selection
  • R-square, Adj-R-square, Accuracy, ROC, AUC, AIC and BIC can be used to get an idea on training error
  • Cross Validation and Boot strapping techniques give us an idea on test error.
  • Choose the model based on the combination of AIC, Cross Validation and Boot strapping results.
  • Bootstrap is widely used in ensemble models & random forests.
0 responses on "204.4.12 Bootstrap Cross Validation"

Leave a Message


Statinfer derived from Statistical inference is a company that focuses on the data science training and R&D.We offer training on Machine Learning, Deep Learning and Artificial Intelligence using tools like R, Python and TensorFlow

Contact Us

We Accept

Our Social Links

How to Become a Data Scientist?

© 2020. All Rights Reserved.