Statinfer

203.7.6 Practice : Random Forest

Building a Random Forest model using Python.

In previous section, we studied about The Random Forest

Let’s implement the concept of Random Forest into practice using R.

LAB: Random Forest

  • Dataset: /Car Accidents IOT/Train.csv
  • Build a decision tree model to predict the fatality of accident
  • Build a decision tree model on the training data.
  • On the test data, calculate the classification error and accuracy.
  • Build a random forest model on the training data.
  • On the test data, calculate the classification error and accuracy.
  • What is the improvement of the Random Forest model when compared with the single tree?

Solution

#Data Import
train<- read.csv("C:/Amrita/Datavedi/Car Accidents IOT/Train.csv")
test<- read.csv("C:/Amrita/Datavedi/Car Accidents IOT/Test.csv")

dim(train)
## [1] 15109    23
head(train)
##   Fatal      S1       S2       S3  S4       S5 S6 S7 S8 S9      S10 S11
## 1     1 36.2247 10.77330 0.243897 596 100.6710  0  0  1 28 0.016064 313
## 2     1 35.7343 17.45510 0.243897 600 100.0000  0  0  1 14 0.015812 319
## 3     1 31.6561  7.61366 0.308763 604  99.3377  0  0  1  4 0.015560 323
## 4     1 33.8320 13.11190 0.293195 616  97.4026  0  0  1  8 0.016001 320
## 5     1 42.5138 13.99850 0.259465 632  94.9367  0  0  1  8 0.016064 322
## 6     1 36.1261 14.85930 0.278925 600 100.0000  0  0  1  4 0.015749 314
##   S12 S13 S14 S15   S16  S17     S18 S19  S20 S21     S22
## 1   1   1  57   0 0.280  240 5.99375   0  0.0   4 14.9382
## 2   1   1  57   0 0.175  240 5.99375   0  0.0   4 14.8827
## 3   1   1  58   0 0.280  240 5.99375   0  0.0   4 14.6005
## 4   1   1  58   0 0.385  240 4.50625   0 13.0   4 14.6782
## 5   1   1  57   0 0.070  240 5.99375   0 19.5   4 15.3461
## 6   1   1  58   0 0.175 1008 4.50625   0 23.9   4 15.0559
###Decision Tree
library(rpart)
crash_model_ds<-rpart(Fatal ~ ., method="class", control=rpart.control(minsplit=30, cp=0.03),   data=train)

#Training accuarcy
predicted_y<-predict(crash_model_ds, type="class")
table(predicted_y)
## predicted_y
##    0    1 
## 5745 9364
confusionMatrix(predicted_y,train$Fatal)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    0    1
##          0 4735 1010
##          1 1581 7783
##                                           
##                Accuracy : 0.8285          
##                  95% CI : (0.8224, 0.8345)
##     No Information Rate : 0.582           
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.643           
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.7497          
##             Specificity : 0.8851          
##          Pos Pred Value : 0.8242          
##          Neg Pred Value : 0.8312          
##              Prevalence : 0.4180          
##          Detection Rate : 0.3134          
##    Detection Prevalence : 0.3802          
##       Balanced Accuracy : 0.8174          
##                                           
##        'Positive' Class : 0               
## 
#Accuaracy on Test data
predicted_test_ds<-predict(crash_model_ds, test, type="class")
confusionMatrix(predicted_test_ds,test$Fatal)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    0    1
##          0 2897  561
##          1  995 4612
##                                           
##                Accuracy : 0.8284          
##                  95% CI : (0.8204, 0.8361)
##     No Information Rate : 0.5707          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.6448          
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.7443          
##             Specificity : 0.8916          
##          Pos Pred Value : 0.8378          
##          Neg Pred Value : 0.8225          
##              Prevalence : 0.4293          
##          Detection Rate : 0.3196          
##    Detection Prevalence : 0.3815          
##       Balanced Accuracy : 0.8179          
##                                           
##        'Positive' Class : 0               
## 
###Random Forest
library(randomForest)
## randomForest 4.6-12
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
## 
##     margin
rf_model <- randomForest(as.factor(train$Fatal) ~ ., ntree=200,   mtry=ncol(train)/3, data=train)

#Training accuaracy
predicted_y<-predict(rf_model)
table(predicted_y)
## predicted_y
##    0    1 
## 5921 9188
confusionMatrix(predicted_y,train$Fatal)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    0    1
##          0 5600  321
##          1  716 8472
##                                           
##                Accuracy : 0.9314          
##                  95% CI : (0.9272, 0.9353)
##     No Information Rate : 0.582           
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.8577          
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.8866          
##             Specificity : 0.9635          
##          Pos Pred Value : 0.9458          
##          Neg Pred Value : 0.9221          
##              Prevalence : 0.4180          
##          Detection Rate : 0.3706          
##    Detection Prevalence : 0.3919          
##       Balanced Accuracy : 0.9251          
##                                           
##        'Positive' Class : 0               
## 
#Accuaracy on Test data
predicted_test_rf<-predict(rf_model,test, type="class")
confusionMatrix(predicted_test_rf,test$Fatal)
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    0    1
##          0 3479  192
##          1  413 4981
##                                           
##                Accuracy : 0.9333          
##                  95% CI : (0.9279, 0.9383)
##     No Information Rate : 0.5707          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.8628          
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.8939          
##             Specificity : 0.9629          
##          Pos Pred Value : 0.9477          
##          Neg Pred Value : 0.9234          
##              Prevalence : 0.4293          
##          Detection Rate : 0.3838          
##    Detection Prevalence : 0.4050          
##       Balanced Accuracy : 0.9284          
##                                           
##        'Positive' Class : 0               
## 

We can see an improvement in the Accuracy

The next post is on Boosting.

0 responses on "203.7.6 Practice : Random Forest"

Leave a Message

Blog Posts

Hurry up!!!

"use coupon code for FLAT 30% discount"  datascientistoffer        ___________________________________      Subscribe to our youtube channel. Get access to video tutorials.                

Contact Us

Statinfer Software Solutions#647 2nd floor 1st Main, Indira Nagar 1st Stage, 100 feet road,Indranagar Bangalore,Karnataka, Pin code:-560038 Landmarks: Opp. Namma Metro Pillar 48.

Connect with us

linkin fn twitter g

How to become a Data Scientist.?

top