203.6.8 Digit Recognition using SVM

LAB: Digit Recognition using SVM

In previous section, we studied about Soft Margin Classification – Noisy Data and Validation

Take an image of a handwritten single digit, and determine what that digit is.
Normalized handwritten digits, automatically scanned from envelopes by the U.S. Postal Service. The original scanned digits are binary and of different sizes and orientations; the images here have been de slanted and size normalized, resultingin 16 x 16 grayscale images (Le Cun et al., 1990).
The data are in two zipped files, and each line consists of the digitid (0-9) followed by the 256 grayscale values.
Build an SVM model that can be used as the digit recognizer
Use the test dataset to validate the true classification power of the model
What is the final accuracy of the model?

Solution

#Importing test and training data
digits_train <- read.table("C:\\Amrita\\Datavedi\\Digit Recognizer\\USPS\\zip.train.txt", quote="\"", comment.char="")
digits_test <- read.table("C:\\Amrita\\Datavedi\\Digit Recognizer\\USPS\\zip.test.txt", quote="\"", comment.char="")
dim(digits_train)

## [1] 7291  257

dim(digits_test)

## [1] 2007  257

#Lets see some images. 
for(i in 1:6 )
{
data_row<-digits_train[i,-1]
pixels = matrix(as.numeric(data_row),16,16,byrow=TRUE)
image(pixels, axes = FALSE)
title(main = paste("Label is" , digits_train[i,1]), font.main = 4)
}

#Are there any missing values?

sum(is.na(digits_train))

## [1] 0

sum(is.na(digits_test))

## [1] 0

#The first variable is label
table(digits_train$V1)

## 
##    0    1    2    3    4    5    6    7    8    9 
## 1194 1005  731  658  652  556  664  645  542  644

table(digits_test$V1)

## 
##   0   1   2   3   4   5   6   7   8   9 
## 359 264 198 166 200 160 170 147 166 177

########SVM Model Building 
library(e1071)

#Lets keep an eye on runtime
pc <- proc.time()

#Verify the code with limited data 5000 rows
number.svm <- svm(V1 ~. , type="C", data = digits_train[1:5000,])

proc.time() - pc

##    user  system elapsed 
##   38.25    0.14   39.37

summary(number.svm)

## 
## Call:
## svm(formula = V1 ~ ., data = digits_train[1:5000, ], type = "C")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
##       gamma:  0.00390625 
## 
## Number of Support Vectors:  2028
## 
##  ( 181 232 245 189 195 45 220 206 305 210 )
## 
## 
## Number of Classes:  10 
## 
## Levels: 
##  0 1 2 3 4 5 6 7 8 9

#Confusion Matrix
library(caret)
label_predicted<-predict(number.svm, type = "class")
confusionMatrix(label_predicted,digits_train[1:5000, 1])

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1   2   3   4   5   6   7   8   9
##          0 847   0   0   0   0   0   1   0   0   0
##          1   0 674   1   0   1   0   1   0   0   0
##          2   0   0 484   0   0   1   0   0   0   0
##          3   0   0   1 392   0   0   0   0   1   1
##          4   0   0   2   0 429   0   0   1   0   0
##          5   0   0   0   1   0 350   1   0   2   0
##          6   0   0   0   0   1   1 475   0   0   0
##          7   0   0   0   0   0   0   0 459   1   2
##          8   0   0   0   2   0   0   0   0 383   0
##          9   0   0   0   0   3   0   0   1   0 481
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9948          
##                  95% CI : (0.9924, 0.9966)
##     No Information Rate : 0.1694          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9942          
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity            1.0000   1.0000   0.9918   0.9924   0.9885   0.9943
## Specificity            0.9998   0.9993   0.9998   0.9993   0.9993   0.9991
## Pos Pred Value         0.9988   0.9956   0.9979   0.9924   0.9931   0.9887
## Neg Pred Value         1.0000   1.0000   0.9991   0.9993   0.9989   0.9996
## Prevalence             0.1694   0.1348   0.0976   0.0790   0.0868   0.0704
## Detection Rate         0.1694   0.1348   0.0968   0.0784   0.0858   0.0700
## Detection Prevalence   0.1696   0.1354   0.0970   0.0790   0.0864   0.0708
## Balanced Accuracy      0.9999   0.9997   0.9958   0.9959   0.9939   0.9967
##                      Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity            0.9937   0.9957   0.9897   0.9938
## Specificity            0.9996   0.9993   0.9996   0.9991
## Pos Pred Value         0.9958   0.9935   0.9948   0.9918
## Neg Pred Value         0.9993   0.9996   0.9991   0.9993
## Prevalence             0.0956   0.0922   0.0774   0.0968
## Detection Rate         0.0950   0.0918   0.0766   0.0962
## Detection Prevalence   0.0954   0.0924   0.0770   0.0970
## Balanced Accuracy      0.9966   0.9975   0.9946   0.9965

table(label_predicted,digits_train[1:5000, 1])

##                
## label_predicted   0   1   2   3   4   5   6   7   8   9
##               0 847   0   0   0   0   0   1   0   0   0
##               1   0 674   1   0   1   0   1   0   0   0
##               2   0   0 484   0   0   1   0   0   0   0
##               3   0   0   1 392   0   0   0   0   1   1
##               4   0   0   2   0 429   0   0   1   0   0
##               5   0   0   0   1   0 350   1   0   2   0
##               6   0   0   0   0   1   1 475   0   0   0
##               7   0   0   0   0   0   0   0 459   1   2
##               8   0   0   0   2   0   0   0   0 383   0
##               9   0   0   0   0   3   0   0   1   0 481

###Out of time validation with test data
test_label_predicted<-predict(number.svm, newdata =digits_test[,-1] , type = "class")
confusionMatrix(test_label_predicted,digits_test[,1])

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1   2   3   4   5   6   7   8   9
##          0 351   0   3   0   0   3   5   0   3   0
##          1   0 253   0   0   1   0   0   0   0   0
##          2   6   2 182   6   5   4   4   3   4   0
##          3   0   0   4 144   0   3   0   0   4   0
##          4   1   5   4   0 185   1   2   5   0   4
##          5   0   0   0  11   2 145   1   0   5   1
##          6   0   3   1   0   3   0 158   0   1   0
##          7   0   0   1   1   1   0   0 137   0   1
##          8   1   0   3   3   0   1   0   0 146   2
##          9   0   1   0   1   3   3   0   2   3 169
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9317          
##                  95% CI : (0.9198, 0.9424)
##     No Information Rate : 0.1789          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9233          
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity            0.9777   0.9583  0.91919  0.86747  0.92500  0.90625
## Specificity            0.9915   0.9994  0.98121  0.99402  0.98783  0.98917
## Pos Pred Value         0.9616   0.9961  0.84259  0.92903  0.89372  0.87879
## Neg Pred Value         0.9951   0.9937  0.99107  0.98812  0.99167  0.99186
## Prevalence             0.1789   0.1315  0.09865  0.08271  0.09965  0.07972
## Detection Rate         0.1749   0.1261  0.09068  0.07175  0.09218  0.07225
## Detection Prevalence   0.1819   0.1266  0.10762  0.07723  0.10314  0.08221
## Balanced Accuracy      0.9846   0.9789  0.95020  0.93075  0.95641  0.94771
##                      Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity           0.92941  0.93197  0.87952  0.95480
## Specificity           0.99565  0.99785  0.99457  0.99290
## Pos Pred Value        0.95181  0.97163  0.93590  0.92857
## Neg Pred Value        0.99348  0.99464  0.98920  0.99562
## Prevalence            0.08470  0.07324  0.08271  0.08819
## Detection Rate        0.07872  0.06826  0.07275  0.08421
## Detection Prevalence  0.08271  0.07025  0.07773  0.09068
## Balanced Accuracy     0.96253  0.96491  0.93704  0.97385

#####Model on Full Data 
pc <- proc.time()
number.svm <- svm(V1 ~. , type="C", data = digits_train)
proc.time() - pc

##    user  system elapsed 
##   76.94    0.26   87.24

summary(number.svm)

## 
## Call:
## svm(formula = V1 ~ ., data = digits_train, type = "C")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
##       gamma:  0.00390625 
## 
## Number of Support Vectors:  2606
## 
##  ( 213 326 319 235 285 63 256 262 401 246 )
## 
## 
## Number of Classes:  10 
## 
## Levels: 
##  0 1 2 3 4 5 6 7 8 9

#Confusion Matrix
library(caret)
label_predicted<-predict(number.svm, type = "class")
confusionMatrix(label_predicted,digits_train[,1])

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    0    1    2    3    4    5    6    7    8    9
##          0 1194    0    0    0    0    0    2    0    0    0
##          1    0 1005    1    1    2    0    1    0    1    0
##          2    0    0  724    0    0    1    0    0    0    0
##          3    0    0    2  651    0    0    0    0    0    1
##          4    0    0    4    0  648    1    0    2    1    1
##          5    0    0    0    3    0  553    0    0    2    0
##          6    0    0    0    0    0    1  661    0    0    0
##          7    0    0    0    0    0    0    0  641    2    3
##          8    0    0    0    3    0    0    0    0  536    0
##          9    0    0    0    0    2    0    0    2    0  639
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9947          
##                  95% CI : (0.9927, 0.9962)
##     No Information Rate : 0.1638          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.994           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity            1.0000   1.0000  0.99042  0.98936  0.99387  0.99460
## Specificity            0.9997   0.9990  0.99985  0.99955  0.99864  0.99926
## Pos Pred Value         0.9983   0.9941  0.99862  0.99541  0.98630  0.99104
## Neg Pred Value         1.0000   1.0000  0.99893  0.99895  0.99940  0.99955
## Prevalence             0.1638   0.1378  0.10026  0.09025  0.08943  0.07626
## Detection Rate         0.1638   0.1378  0.09930  0.08929  0.08888  0.07585
## Detection Prevalence   0.1640   0.1387  0.09944  0.08970  0.09011  0.07653
## Balanced Accuracy      0.9998   0.9995  0.99514  0.99445  0.99625  0.99693
##                      Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity           0.99548  0.99380  0.98893  0.99224
## Specificity           0.99985  0.99925  0.99956  0.99940
## Pos Pred Value        0.99849  0.99226  0.99443  0.99378
## Neg Pred Value        0.99955  0.99940  0.99911  0.99925
## Prevalence            0.09107  0.08847  0.07434  0.08833
## Detection Rate        0.09066  0.08792  0.07352  0.08764
## Detection Prevalence  0.09080  0.08860  0.07393  0.08819
## Balanced Accuracy     0.99767  0.99652  0.99424  0.99582

###Out of time validation with test data
test_label_predicted<-predict(number.svm, newdata =digits_test[,-1] , type = "class")
confusionMatrix(test_label_predicted,digits_test[,1])

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1   2   3   4   5   6   7   8   9
##          0 351   0   2   0   0   3   4   0   4   0
##          1   0 253   0   0   1   0   0   0   0   0
##          2   6   1 183   5   3   2   4   2   2   0
##          3   0   0   4 146   0   3   0   0   3   0
##          4   1   5   3   0 186   1   2   5   0   4
##          5   0   1   0  11   1 147   1   0   2   1
##          6   0   3   1   0   2   0 158   0   1   0
##          7   0   1   1   1   3   0   0 138   0   0
##          8   1   0   4   3   1   1   1   0 151   2
##          9   0   0   0   0   3   3   0   2   3 170
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9382          
##                  95% CI : (0.9268, 0.9484)
##     No Information Rate : 0.1789          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9306          
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity            0.9777   0.9583  0.92424  0.87952  0.93000  0.91875
## Specificity            0.9921   0.9994  0.98618  0.99457  0.98838  0.99080
## Pos Pred Value         0.9643   0.9961  0.87981  0.93590  0.89855  0.89634
## Neg Pred Value         0.9951   0.9937  0.99166  0.98920  0.99222  0.99295
## Prevalence             0.1789   0.1315  0.09865  0.08271  0.09965  0.07972
## Detection Rate         0.1749   0.1261  0.09118  0.07275  0.09268  0.07324
## Detection Prevalence   0.1814   0.1266  0.10364  0.07773  0.10314  0.08171
## Balanced Accuracy      0.9849   0.9789  0.95521  0.93704  0.95919  0.95477
##                      Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity           0.92941  0.93878  0.90964  0.96045
## Specificity           0.99619  0.99677  0.99294  0.99399
## Pos Pred Value        0.95758  0.95833  0.92073  0.93923
## Neg Pred Value        0.99349  0.99517  0.99186  0.99617
## Prevalence            0.08470  0.07324  0.08271  0.08819
## Detection Rate        0.07872  0.06876  0.07524  0.08470
## Detection Prevalence  0.08221  0.07175  0.08171  0.09018
## Balanced Accuracy     0.96280  0.96777  0.95129  0.97722

#Lets see some predictions. 
digits_test$predicted<-test_label_predicted

for(i in 1:10 )
{
data_row<-digits_test[i,c(-1,-ncol(digits_test))]
pixels = matrix(as.numeric(data_row),16,16,byrow=TRUE)
image(pixels, axes = FALSE)
title(main = paste("Label is" , digits_test[i,1] ,"  Prediction is" , digits_test[i,ncol(digits_test)]))
}

#Lets see some errors in predictions images. 
# Wrong predictions
digits_test$predicted<-test_label_predicted
wrong_predictions<-digits_test[!(digits_test$predicted==digits_test$V1),]
nrow(wrong_predictions)

## [1] 124

for(i in 1:10 )
{
data_row<-wrong_predictions[i,c(-1,-ncol(wrong_predictions))]
pixels = matrix(as.numeric(data_row),16,16,byrow=TRUE)
image(pixels, axes = FALSE)
title(main = paste("Label is" , wrong_predictions[i,1] ,"  Prediction is" , wrong_predictions[i,ncol(wrong_predictions)]))
}

Out[49]:

116

We can see out of 2007 images only 116 were wrongly identified by our SVM model.
With this post we will be ending the series here. In next series we will cover Random Forest and Boosting.

The next post is about SVM conclusion.

21st June 2017