204.7.5 The Random Forest

Getting started with Random Forest.
Link to the previous post :

Random Forest

  • Like many trees form a forest, many decision tree model together form a Random Forest model.
  • Random forest is a specific case of bagging methodology. Bagging on decision trees is random forest.
  • In random forest we induce two types of randomness
    • Firstly, we take the boot strap samples of the population and build decision trees on each of the sample.
    • While building the individual trees on boot strap samples, we take a subset of the features randomly.
  • Random forests are very stable they are as good as SVMs and sometimes better.

Random Forest Algorithm

  • The training dataset D with t number of features.
  • Draw k boot strap sample sets from dataset D.
  • For each boot strap sample i
    • Build a decision tree model Mi using only p number of features (where p<<t).
    • Each tree has maximal strength they are fully grown and not pruned.
  • We will have total of k decision treed M1,M2,...Mk;
  • Each of these trees are built on reactively different training data and different set of features.
  • Vote over for the final classifier output and take the average for regression output.

The Random Factors in Random Forest

  • We need to note the most important aspect of random forest, i.e inducing randomness into the bagging of trees. There are two major sources of randomness
    • Randomness in data: Boot strapping, this will make sure that any two samples data is somewhat different.
    • Randomness in features: While building the decision trees on boot strapped samples we consider only a random subset of features.
  • Why to induce the randomness?
    • The major trick of ensemble models is the independence of models.
    • If we take the same data and build same model for 100 times, we will not see any improvement.
    • To make all our decision trees independent, we take independent samples set and independent features set.
    • As a rule of thumb we can consider square root of the number of features, if ‘t’ is very large else p=t/3.

Why Random Forest Works

  • For a training data with 20 features we are building 100 decision trees with 5 features each, instated of single great decision. The individual trees may be weak classifiers.
  • Its like building weak classifiers on subsets of data. The grouping of large sets of random trees generally produces accurate models.
  • In this example, we have three simple classifiers.
  • m1 classifies anything above the line as +1 and below as -1, m2 classifies all the points above the line as -1 and below as +1 and m3 classifies everything on the left as -1 and right as +1
  • Each of these models have fair amount of misclassification error.
  • All these three weak models together make a strong model.

The next post is a practice session on random forest.

Link to the next post :

0 responses on "204.7.5 The Random Forest"

Leave a Message

Blog Posts

Hurry up!!!

"use coupon code for FLAT 30% discount"  datascientistoffer        ___________________________________      Subscribe to our youtube channel. Get access to video tutorials.                

Contact Us

Statinfer Software Solutions#647 2nd floor 1st Main, Indira Nagar 1st Stage, 100 feet road,Indranagar Bangalore,Karnataka, Pin code:-560038 Landmarks: Opp. Namma Metro Pillar 48.

Connect with us

linkin fn twitter g

How to become a Data Scientist.?