In this series of posts we will cover Random Forests and Boosting concepts and implement whatever we learn using Python.
In this first post we will cover Wisdom of Crowd, which is key concept behind Random Forests and ensemble learning.
The Wisdom of Crowds
- One should not expend energy trying to identify an expert within a group but instead rely on the group’s collective wisdom, however make sure that Opinions must be independent and some knowledge of the truth must reside with some group members – Surowiecki.
- So instead of trying to build one great model, its better to build some independent moderate models and take their average as final prediction.
What is Ensemble Learning
- Imagine a classifier problem, there are two classes +1 & -1 in the target.
- Imagine that we built a best possible decision tree, it has 91% accuracy.
- Let x be the new data point and our decision tree predicts it to be +1. Is there a way I can do better than 91% by using the same data.
- Lets build 3 more models on the same data. And see we can improve the performance
- We have four models on the same dataset, Each of them have different accuracy. But unfortunately there seem to be no real improvement in the accuracy.
- What about prediction of the data point x?
- Except the decision tree, the rest all algorithms are predicting the class of x as -1; Intuitively we would like to believe that the class of x is -1.
- The combined voting model seem to be having less error than each of the individual models. This is the actual philosophy of ensemble learning.