• LOGIN
  • No products in the cart.

204.7.6 Practice : Random Forest

Building a Random Forest model using Python.

Link to the previous post : https://statinfer.com/204-7-5-the-random-forest/
Let’s implement the concept of Random Forest into practice using Python.

Practice : Random Forest

  • Dataset: /Car Accidents IOT/Train.csv
  • Build a decision tree model to predict the fatality of accident
  • Build a decision tree model on the training data.
  • On the test data, calculate the classification error and accuracy.
  • Build a random forest model on the training data.
  • On the test data, calculate the classification error and accuracy.
  • What is the improvement of the Random Forest model when compared with the single tree?
In [10]:
#Importing dataset
car_train=pd.read_csv("datasets\\Car Accidents IOT\\train.csv")
car_test=pd.read_csv("datasets\\Car Accidents IOT\\test.csv")
In [11]:
from sklearn import tree

var=list(car_train.columns[1:22])
c=car_train[var]
d=car_train['Fatal']

###buildng Decision tree on the training data ####
clf = tree.DecisionTreeClassifier()
clf.fit(c,d)
Out[11]:
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            presort=False, random_state=None, splitter='best')
In [12]:
#####predicting on test data ####
tree_predict=clf.predict(car_test[var])
In [13]:
from sklearn.metrics import confusion_matrix###for using confusion matrix###
cm1 = confusion_matrix(car_test[['Fatal']],tree_predict)
print(cm1)
[[3244  648]
 [ 695 4478]]
In [14]:
#####from confusion matrix calculate accuracy
total1=sum(sum(cm1))
accuracy_tree=(cm1[0,0]+cm1[1,1])/total1
accuracy_tree
Out[14]:
0.85184776613348046
In [15]:
from sklearn.metrics import confusion_matrix###for using confusion matrix###
cm1 = confusion_matrix(car_test[['Fatal']],tree_predict)
print(cm1)
total1=sum(sum(cm1))
#####from confusion matrix calculate accuracy
accuracy_tree=(cm1[0,0]+cm1[1,1])/total1
accuracy_tree
[[3244  648]
 [ 695 4478]]
Out[15]:
0.85184776613348046
In [16]:
### accuracy_score() also gives the same result[using confusion matrix]
from sklearn.metrics import accuracy_score
accuracy_score(car_test[['Fatal']],tree_predict, normalize=True, sample_weight=None)
Out[16]:
0.85184776613348046
In [17]:
####buliding a random forest classifier on training data#####
from sklearn.ensemble import RandomForestClassifier
forest=RandomForestClassifier(n_estimators=10, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, class_weight=None)

forest.fit(c,d)
Out[17]:
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)
In [18]:
###predicting on test data with RF model
forestpredict_test=forest.predict(car_test[var])
e=car_test['Fatal']
In [19]:
###check the accuracy on test data
from sklearn.metrics import confusion_matrix###for using confusion matrix###
cm2 = confusion_matrix(car_test[['Fatal']],forestpredict_test)
print(cm2)
total2=sum(sum(cm2))
#####from confusion matrix calculate accuracy
accuracy_forest=(cm2[0,0]+cm2[1,1])/total2
accuracy_forest
[[3383  509]
 [ 471 4702]]
Out[19]:
0.89189189189189189
  • We can see an improvement in the Accuracy

The next post is about boosting.
Link to the next post : https://statinfer.com/204-7-7-boosting/

0 responses on "204.7.6 Practice : Random Forest"

Leave a Message