Link to the previous post : https://statinfer.com/204-7-5-the-random-forest/
Let’s implement the concept of Random Forest into practice using Python.

Practice : Random Forest

Dataset: /Car Accidents IOT/Train.csv
Build a decision tree model to predict the fatality of accident
Build a decision tree model on the training data.
On the test data, calculate the classification error and accuracy.
Build a random forest model on the training data.
On the test data, calculate the classification error and accuracy.
What is the improvement of the Random Forest model when compared with the single tree?

In [10]:

#Importing dataset
car_train=pd.read_csv("datasets\\Car Accidents IOT\\train.csv")
car_test=pd.read_csv("datasets\\Car Accidents IOT\\test.csv")

In [11]:

from sklearn import tree

var=list(car_train.columns[1:22])
c=car_train[var]
d=car_train['Fatal']

###buildng Decision tree on the training data ####
clf = tree.DecisionTreeClassifier()
clf.fit(c,d)

Out[11]:

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            presort=False, random_state=None, splitter='best')

In [12]:

#####predicting on test data ####
tree_predict=clf.predict(car_test[var])

In [13]:

from sklearn.metrics import confusion_matrix###for using confusion matrix###
cm1 = confusion_matrix(car_test[['Fatal']],tree_predict)
print(cm1)

[[3244  648]
 [ 695 4478]]

In [14]:

#####from confusion matrix calculate accuracy
total1=sum(sum(cm1))
accuracy_tree=(cm1[0,0]+cm1[1,1])/total1
accuracy_tree

Out[14]:

0.85184776613348046

In [15]:

from sklearn.metrics import confusion_matrix###for using confusion matrix###
cm1 = confusion_matrix(car_test[['Fatal']],tree_predict)
print(cm1)
total1=sum(sum(cm1))
#####from confusion matrix calculate accuracy
accuracy_tree=(cm1[0,0]+cm1[1,1])/total1
accuracy_tree

[[3244  648]
 [ 695 4478]]

Out[15]:

0.85184776613348046

In [16]:

### accuracy_score() also gives the same result[using confusion matrix]
from sklearn.metrics import accuracy_score
accuracy_score(car_test[['Fatal']],tree_predict, normalize=True, sample_weight=None)

Out[16]:

0.85184776613348046

In [17]:

####buliding a random forest classifier on training data#####
from sklearn.ensemble import RandomForestClassifier
forest=RandomForestClassifier(n_estimators=10, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False, class_weight=None)

forest.fit(c,d)

Out[17]:

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False)

In [18]:

###predicting on test data with RF model
forestpredict_test=forest.predict(car_test[var])
e=car_test['Fatal']

In [19]:

###check the accuracy on test data
from sklearn.metrics import confusion_matrix###for using confusion matrix###
cm2 = confusion_matrix(car_test[['Fatal']],forestpredict_test)
print(cm2)
total2=sum(sum(cm2))
#####from confusion matrix calculate accuracy
accuracy_forest=(cm2[0,0]+cm2[1,1])/total2
accuracy_forest

[[3383  509]
 [ 471 4702]]

Out[19]:

0.89189189189189189

We can see an improvement in the Accuracy

The next post is about boosting.
Link to the next post : https://statinfer.com/204-7-7-boosting/

21st June 2017

204.7.6 Practice : Random Forest

Building a Random Forest model using Python.

Practice : Random Forest

Statinfer

Statinfer

Statinfer

204.7.6 Practice : Random Forest

Building a Random Forest model using Python.

Practice : Random Forest

Related Courses

Python(Batch6)

Statinfer

Tableau (Batch6)

Statinfer

PowerBI (Batch6)

Statinfer