Link to the previous post : https://statinfer.com/204-6-8-svm-advantages-disadvantages-applications/
In this , we will put SVM into practice by solving an image classification problem.
LAB: Digit Recognition using SVM
- Take an image of a handwritten single digit, and determine what that digit is.
- Normalized handwritten digits, automatically scanned from envelopes by the U.S. Postal Service. The original scanned digits are binary and of different sizes and orientations; the images here have been de slanted and size normalized, resultingin 16 x 16 grayscale images (Le Cun et al., 1990).
- The data are in two gzipped files, and each line consists of the digitid (0-9) followed by the 256 grayscale values.
- Build an SVM model that can be used as the digit recognizer.
- Use the test dataset to validate the true classification power of the model.
- What is the final accuracy of the model?
Solution
#Importing test and training data
digits_train <- read.table("C:\\Amrita\\Datavedi\\Digit Recognizer\\USPS\\zip.train.txt", quote="\"", comment.char="")
digits_test <- read.table("C:\\Amrita\\Datavedi\\Digit Recognizer\\USPS\\zip.test.txt", quote="\"", comment.char="")
dim(digits_train)
## [1] 7291 257
dim(digits_test)
## [1] 2007 257
#Lets see some images.
for(i in 1:6 )
{
data_row<-digits_train[i,-1]
pixels = matrix(as.numeric(data_row),16,16,byrow=TRUE)
image(pixels, axes = FALSE)
title(main = paste("Label is" , digits_train[i,1]), font.main = 4)
}
#Are there any missing values?
sum(is.na(digits_train))
## [1] 0
sum(is.na(digits_test))
## [1] 0
#The first variable is label
table(digits_train$V1)
##
## 0 1 2 3 4 5 6 7 8 9
## 1194 1005 731 658 652 556 664 645 542 644
table(digits_test$V1)
##
## 0 1 2 3 4 5 6 7 8 9
## 359 264 198 166 200 160 170 147 166 177
########SVM Model Building
library(e1071)
#Lets keep an eye on runtime
pc <- proc.time()
#Verify the code with limited data 5000 rows
number.svm <- svm(V1 ~. , type="C", data = digits_train[1:5000,])
proc.time() - pc
## user system elapsed
## 38.25 0.14 39.37
summary(number.svm)
##
## Call:
## svm(formula = V1 ~ ., data = digits_train[1:5000, ], type = "C")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
## gamma: 0.00390625
##
## Number of Support Vectors: 2028
##
## ( 181 232 245 189 195 45 220 206 305 210 )
##
##
## Number of Classes: 10
##
## Levels:
## 0 1 2 3 4 5 6 7 8 9
#Confusion Matrix
library(caret)
label_predicted<-predict(number.svm, type = "class")
confusionMatrix(label_predicted,digits_train[1:5000, 1])
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 847 0 0 0 0 0 1 0 0 0
## 1 0 674 1 0 1 0 1 0 0 0
## 2 0 0 484 0 0 1 0 0 0 0
## 3 0 0 1 392 0 0 0 0 1 1
## 4 0 0 2 0 429 0 0 1 0 0
## 5 0 0 0 1 0 350 1 0 2 0
## 6 0 0 0 0 1 1 475 0 0 0
## 7 0 0 0 0 0 0 0 459 1 2
## 8 0 0 0 2 0 0 0 0 383 0
## 9 0 0 0 0 3 0 0 1 0 481
##
## Overall Statistics
##
## Accuracy : 0.9948
## 95% CI : (0.9924, 0.9966)
## No Information Rate : 0.1694
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9942
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity 1.0000 1.0000 0.9918 0.9924 0.9885 0.9943
## Specificity 0.9998 0.9993 0.9998 0.9993 0.9993 0.9991
## Pos Pred Value 0.9988 0.9956 0.9979 0.9924 0.9931 0.9887
## Neg Pred Value 1.0000 1.0000 0.9991 0.9993 0.9989 0.9996
## Prevalence 0.1694 0.1348 0.0976 0.0790 0.0868 0.0704
## Detection Rate 0.1694 0.1348 0.0968 0.0784 0.0858 0.0700
## Detection Prevalence 0.1696 0.1354 0.0970 0.0790 0.0864 0.0708
## Balanced Accuracy 0.9999 0.9997 0.9958 0.9959 0.9939 0.9967
## Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity 0.9937 0.9957 0.9897 0.9938
## Specificity 0.9996 0.9993 0.9996 0.9991
## Pos Pred Value 0.9958 0.9935 0.9948 0.9918
## Neg Pred Value 0.9993 0.9996 0.9991 0.9993
## Prevalence 0.0956 0.0922 0.0774 0.0968
## Detection Rate 0.0950 0.0918 0.0766 0.0962
## Detection Prevalence 0.0954 0.0924 0.0770 0.0970
## Balanced Accuracy 0.9966 0.9975 0.9946 0.9965
table(label_predicted,digits_train[1:5000, 1])
##
## label_predicted 0 1 2 3 4 5 6 7 8 9
## 0 847 0 0 0 0 0 1 0 0 0
## 1 0 674 1 0 1 0 1 0 0 0
## 2 0 0 484 0 0 1 0 0 0 0
## 3 0 0 1 392 0 0 0 0 1 1
## 4 0 0 2 0 429 0 0 1 0 0
## 5 0 0 0 1 0 350 1 0 2 0
## 6 0 0 0 0 1 1 475 0 0 0
## 7 0 0 0 0 0 0 0 459 1 2
## 8 0 0 0 2 0 0 0 0 383 0
## 9 0 0 0 0 3 0 0 1 0 481
###Out of time validation with test data
test_label_predicted<-predict(number.svm, newdata =digits_test[,-1] , type = "class")
confusionMatrix(test_label_predicted,digits_test[,1])
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 351 0 3 0 0 3 5 0 3 0
## 1 0 253 0 0 1 0 0 0 0 0
## 2 6 2 182 6 5 4 4 3 4 0
## 3 0 0 4 144 0 3 0 0 4 0
## 4 1 5 4 0 185 1 2 5 0 4
## 5 0 0 0 11 2 145 1 0 5 1
## 6 0 3 1 0 3 0 158 0 1 0
## 7 0 0 1 1 1 0 0 137 0 1
## 8 1 0 3 3 0 1 0 0 146 2
## 9 0 1 0 1 3 3 0 2 3 169
##
## Overall Statistics
##
## Accuracy : 0.9317
## 95% CI : (0.9198, 0.9424)
## No Information Rate : 0.1789
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9233
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity 0.9777 0.9583 0.91919 0.86747 0.92500 0.90625
## Specificity 0.9915 0.9994 0.98121 0.99402 0.98783 0.98917
## Pos Pred Value 0.9616 0.9961 0.84259 0.92903 0.89372 0.87879
## Neg Pred Value 0.9951 0.9937 0.99107 0.98812 0.99167 0.99186
## Prevalence 0.1789 0.1315 0.09865 0.08271 0.09965 0.07972
## Detection Rate 0.1749 0.1261 0.09068 0.07175 0.09218 0.07225
## Detection Prevalence 0.1819 0.1266 0.10762 0.07723 0.10314 0.08221
## Balanced Accuracy 0.9846 0.9789 0.95020 0.93075 0.95641 0.94771
## Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity 0.92941 0.93197 0.87952 0.95480
## Specificity 0.99565 0.99785 0.99457 0.99290
## Pos Pred Value 0.95181 0.97163 0.93590 0.92857
## Neg Pred Value 0.99348 0.99464 0.98920 0.99562
## Prevalence 0.08470 0.07324 0.08271 0.08819
## Detection Rate 0.07872 0.06826 0.07275 0.08421
## Detection Prevalence 0.08271 0.07025 0.07773 0.09068
## Balanced Accuracy 0.96253 0.96491 0.93704 0.97385
#####Model on Full Data
pc <- proc.time()
number.svm <- svm(V1 ~. , type="C", data = digits_train)
proc.time() - pc
## user system elapsed
## 76.94 0.26 87.24
summary(number.svm)
##
## Call:
## svm(formula = V1 ~ ., data = digits_train, type = "C")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
## gamma: 0.00390625
##
## Number of Support Vectors: 2606
##
## ( 213 326 319 235 285 63 256 262 401 246 )
##
##
## Number of Classes: 10
##
## Levels:
## 0 1 2 3 4 5 6 7 8 9
#Confusion Matrix
library(caret)
label_predicted<-predict(number.svm, type = "class")
confusionMatrix(label_predicted,digits_train[,1])
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 1194 0 0 0 0 0 2 0 0 0
## 1 0 1005 1 1 2 0 1 0 1 0
## 2 0 0 724 0 0 1 0 0 0 0
## 3 0 0 2 651 0 0 0 0 0 1
## 4 0 0 4 0 648 1 0 2 1 1
## 5 0 0 0 3 0 553 0 0 2 0
## 6 0 0 0 0 0 1 661 0 0 0
## 7 0 0 0 0 0 0 0 641 2 3
## 8 0 0 0 3 0 0 0 0 536 0
## 9 0 0 0 0 2 0 0 2 0 639
##
## Overall Statistics
##
## Accuracy : 0.9947
## 95% CI : (0.9927, 0.9962)
## No Information Rate : 0.1638
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.994
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity 1.0000 1.0000 0.99042 0.98936 0.99387 0.99460
## Specificity 0.9997 0.9990 0.99985 0.99955 0.99864 0.99926
## Pos Pred Value 0.9983 0.9941 0.99862 0.99541 0.98630 0.99104
## Neg Pred Value 1.0000 1.0000 0.99893 0.99895 0.99940 0.99955
## Prevalence 0.1638 0.1378 0.10026 0.09025 0.08943 0.07626
## Detection Rate 0.1638 0.1378 0.09930 0.08929 0.08888 0.07585
## Detection Prevalence 0.1640 0.1387 0.09944 0.08970 0.09011 0.07653
## Balanced Accuracy 0.9998 0.9995 0.99514 0.99445 0.99625 0.99693
## Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity 0.99548 0.99380 0.98893 0.99224
## Specificity 0.99985 0.99925 0.99956 0.99940
## Pos Pred Value 0.99849 0.99226 0.99443 0.99378
## Neg Pred Value 0.99955 0.99940 0.99911 0.99925
## Prevalence 0.09107 0.08847 0.07434 0.08833
## Detection Rate 0.09066 0.08792 0.07352 0.08764
## Detection Prevalence 0.09080 0.08860 0.07393 0.08819
## Balanced Accuracy 0.99767 0.99652 0.99424 0.99582
###Out of time validation with test data
test_label_predicted<-predict(number.svm, newdata =digits_test[,-1] , type = "class")
confusionMatrix(test_label_predicted,digits_test[,1])
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1 2 3 4 5 6 7 8 9
## 0 351 0 2 0 0 3 4 0 4 0
## 1 0 253 0 0 1 0 0 0 0 0
## 2 6 1 183 5 3 2 4 2 2 0
## 3 0 0 4 146 0 3 0 0 3 0
## 4 1 5 3 0 186 1 2 5 0 4
## 5 0 1 0 11 1 147 1 0 2 1
## 6 0 3 1 0 2 0 158 0 1 0
## 7 0 1 1 1 3 0 0 138 0 0
## 8 1 0 4 3 1 1 1 0 151 2
## 9 0 0 0 0 3 3 0 2 3 170
##
## Overall Statistics
##
## Accuracy : 0.9382
## 95% CI : (0.9268, 0.9484)
## No Information Rate : 0.1789
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9306
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5
## Sensitivity 0.9777 0.9583 0.92424 0.87952 0.93000 0.91875
## Specificity 0.9921 0.9994 0.98618 0.99457 0.98838 0.99080
## Pos Pred Value 0.9643 0.9961 0.87981 0.93590 0.89855 0.89634
## Neg Pred Value 0.9951 0.9937 0.99166 0.98920 0.99222 0.99295
## Prevalence 0.1789 0.1315 0.09865 0.08271 0.09965 0.07972
## Detection Rate 0.1749 0.1261 0.09118 0.07275 0.09268 0.07324
## Detection Prevalence 0.1814 0.1266 0.10364 0.07773 0.10314 0.08171
## Balanced Accuracy 0.9849 0.9789 0.95521 0.93704 0.95919 0.95477
## Class: 6 Class: 7 Class: 8 Class: 9
## Sensitivity 0.92941 0.93878 0.90964 0.96045
## Specificity 0.99619 0.99677 0.99294 0.99399
## Pos Pred Value 0.95758 0.95833 0.92073 0.93923
## Neg Pred Value 0.99349 0.99517 0.99186 0.99617
## Prevalence 0.08470 0.07324 0.08271 0.08819
## Detection Rate 0.07872 0.06876 0.07524 0.08470
## Detection Prevalence 0.08221 0.07175 0.08171 0.09018
## Balanced Accuracy 0.96280 0.96777 0.95129 0.97722
#Lets see some predictions.
digits_test$predicted<-test_label_predicted
for(i in 1:10 )
{
data_row<-digits_test[i,c(-1,-ncol(digits_test))]
pixels = matrix(as.numeric(data_row),16,16,byrow=TRUE)
image(pixels, axes = FALSE)
title(main = paste("Label is" , digits_test[i,1] ," Prediction is" , digits_test[i,ncol(digits_test)]))
}
#Lets see some errors in predictions images.
# Wrong predictions
digits_test$predicted<-test_label_predicted
wrong_predictions<-digits_test[!(digits_test$predicted==digits_test$V1),]
nrow(wrong_predictions)
## [1] 124
for(i in 1:10 )
{
data_row<-wrong_predictions[i,c(-1,-ncol(wrong_predictions))]
pixels = matrix(as.numeric(data_row),16,16,byrow=TRUE)
image(pixels, axes = FALSE)
title(main = paste("Label is" , wrong_predictions[i,1] ," Prediction is" , wrong_predictions[i,ncol(wrong_predictions)]))
}
Conclusion
- Many software tools are available for SVM implementation
- SVMs are really good for text classification
- SVMs are good at finding the best linear separator. The kernel trick makes SVMs non-linear learning algorithms
- Choosing an appropriate kernel is the key for good SVM and choosing the right kernel function is not easy
- We need to be patient while building SVMs on large datasets. They take a lot of time for training.