• LOGIN
  • No products in the cart.

203.7.9 Boosting Conclusion

When does boosting not work?

When Ensemble doesn’t work?

In previous section, we studied about Practice : Boosting

  • The models have to be independent, we can’t build the same model multiple times and expect the error to reduce.
  • We may have to bring in the independence by choosing subsets of data, or subset of features while building the individual models.
  • Ensemble may backfire if we use dependent models that are already less accurate. The final ensemble might turn out to be even worse model.
  • Yes, there is a small disclaimer in “Wisdom of Crowd” theory. We need good independent individuals. If we collate any dependent individuals with poor knowledge, then we might end with an even worse ensemble.
  • For example, we built three models, model-1 , model-2 are bad, model-3 is good. Most of the times ensemble will result the combined output of model-1 and model-2, based on voting.

Conclusion

  • Ensemble methods are most widely used methods these days. With advanced machines, its not really a huge task to build multiple models.
  • Both bagging and boosting does a good job of reducing bias and variance.
  • Random forests are relatively fast, since we are building many small trees, it doesn’t put lot of pressure on the computing machine.
  • Random forest can also give the variable importance. We need to be careful with categorical features, random forests tend to give higher importance to variables with higher number of levels.
  • In Boosted algorithms we may have to restrict the number of iterations to avoid overfitting.
  • Ensemble models are the final effort of a data scientist, while building the most suitable predictive model for the data.

0 responses on "203.7.9 Boosting Conclusion"

Leave a Message