Link to the previous post : https://statinfer.com/204-5-11-hidden-layers-and-their-roles/

As promised, in the first post of the series we will build a Neural Network that will read the image of a digit and correctly identify the number.

Practice : Digit Recognizer

Take an image of a handwritten single digit, and determine what that digit is.
Normalized handwritten digits, automatically scanned from envelopes by the U.S. Postal Service. The original scanned digits are binary and of different sizes and orientations; the images here have been de slanted and size normalized, resultingin 16 x 16 grayscale images (Le Cun et al., 1990).
The data are in two gzipped files, and each line consists of the digitid (0-9) followed by the 256 grayscale values.
Build a neural network model that can be used as the digit recognizer.
Use the test dataset to validate the true classification power of the model
What is the final accuracy of the model?
We can see them as multiple lines on the decision space

In [55]:

#Importing test and training data
import numpy as np
digits_train = np.loadtxt("datasets\\Digit Recognizer\\USPS\\zip.train.txt")

In [56]:

#digits_train is numpy array. we convert it into dataframe for better handling
train_data=pd.DataFrame(digits_train)
train_data.shape

Out[56]:

(7291, 257)

In [57]:

digits_test = np.loadtxt("datasets\\Digit Recognizer\\USPS\\zip.test.txt")
#digits_test is numpy array. we convert it into dataframe for better handling
test_data=pd.DataFrame(digits_test)
test_data.shape

Out[57]:

(2007, 257)

In [58]:

train_data[0].value_counts()     #To get labels of the images

Out[58]:

0.0    1194
1.0    1005
2.0     731
6.0     664
3.0     658
4.0     652
7.0     645
9.0     644
5.0     556
8.0     542
Name: 0, dtype: int64

In [59]:

import matplotlib.pyplot as plt

#Lets have a look at some images.

for i in range(0,5):
    data_row=digits_train[i][1:]
    #pixels = matrix(as.numeric(data_row),16,16,byrow=TRUE)
    pixels = np.matrix(data_row)
    pixels=pixels.reshape(16,16)
    plt.figure(figsize=(10,10))
    plt.subplot(3,3,i+1)
    plt.imshow(pixels)

In [60]:

#Creating multiple columns for multiple outputs
#####We need these variables while building the model
digit_labels=pd.DataFrame()
digit_labels['label']=train_data[0:][0]
label_names=['I0','I1','I2','I3','I4','I5','I6','I7','I8','I9']
for i in range(0,10):
    digit_labels[label_names[i]]=digit_labels.label==i
#see our newly created labels data
digit_labels.head(10)

Out[60]:

	label	I0	I1	I2	I3	I4	I5	I6	I7	I8	I9
0	6.0	False	False	False	False	False	False	True	False	False	False
1	5.0	False	False	False	False	False	True	False	False	False	False
2	4.0	False	False	False	False	True	False	False	False	False	False
3	7.0	False	False	False	False	False	False	False	True	False	False
4	3.0	False	False	False	True	False	False	False	False	False	False
5	6.0	False	False	False	False	False	False	True	False	False	False
6	3.0	False	False	False	True	False	False	False	False	False	False
7	1.0	False	True	False	False	False	False	False	False	False	False
8	0.0	True	False	False	False	False	False	False	False	False	False
9	1.0	False	True	False	False	False	False	False	False	False	False

In [61]:

#Update the training dataset
train_data1=pd.concat([train_data,digit_labels],axis=1)
print(train_data1.shape)
train_data1.head(5)

(7291, 268)

Out[61]:

	0	1	2	3	4	5	6	7	8	9	…	I0	I1	I2	I3	I4	I5	I6	I7	I8	I9
0	6.0	-1.0	-1.0	-1.0	-1.000	-1.000	-1.000	-1.000	-0.631	0.862	…	False	False	False	False	False	False	True	False	False	False
1	5.0	-1.0	-1.0	-1.0	-0.813	-0.671	-0.809	-0.887	-0.671	-0.853	…	False	False	False	False	False	True	False	False	False	False
2	4.0	-1.0	-1.0	-1.0	-1.000	-1.000	-1.000	-1.000	-1.000	-1.000	…	False	False	False	False	True	False	False	False	False	False
3	7.0	-1.0	-1.0	-1.0	-1.000	-1.000	-0.273	0.684	0.960	0.450	…	False	False	False	False	False	False	False	True	False	False
4	3.0	-1.0	-1.0	-1.0	-1.000	-1.000	-0.928	-0.204	0.751	0.466	…	False	False	False	True	False	False	False	False	False	False

5 rows × 268 columns

In [62]:

#########Neural network building
import neurolab as nl
import numpy as np
import pylab as pl

x_train=train_data.drop(train_data.columns[[0]], axis=1)
y_train=digit_labels.drop(digit_labels.columns[[0]], axis=1)

In [63]:

#getting minimum and maximum of each column of x_train into a list
def minMax(x):
    return pd.Series(index=['min','max'],data=[x.min(),x.max()])

In [64]:

listvalues = x_train.apply(minMax).T.values.tolist()

error = []

In [66]:

# Create network with 1 layer and random initialized
net = nl.net.newff(listvalues,[20,10],transf=[nl.trans.LogSig()] * 2)
net.trainf = nl.train.train_rprop

In [67]:

# Train network
import time
start_time = time.time()
error.append(net.train(x_train, y_train, show=0, epochs = 250,goal=0.02))
print("--- %s seconds ---" % (time.time() - start_time))

--- 286.51438784599304 seconds ---

In [68]:

# Prediction testing data
x_test=test_data.drop(test_data.columns[[0]], axis=1)
y_test=test_data[0:][0]

predicted_values = net.sim(x_test.as_matrix())
predict=pd.DataFrame(predicted_values)

index=predict.idxmax(axis=1)

In [69]:

#confusion matrix
from sklearn.metrics import confusion_matrix as cm
ConfusionMatrix = cm(y_test,index)
print('Confusion Matrix : ', ConfusionMatrix)

#accuracy
accuracy=np.trace(ConfusionMatrix)/sum(sum(ConfusionMatrix))
print('Accuracy : ', accuracy)

error=1-accuracy
print('Error : ', error)

Confusion Matrix :  [[339   0   5   3   2   4   5   1   0   0]
 [  0 249   2   2   3   1   4   1   1   1]
 [  4   0 169   4   7   2   6   1   5   0]
 [  3   0   5 143   0   9   0   1   2   3]
 [  0   2   4   0 180   4   2   2   1   5]
 [  6   0   2  10   2 134   0   2   1   3]
 [  4   0   3   0   4   5 152   0   2   0]
 [  0   0   1   1   4   0   0 135   2   4]
 [  6   1   2   7   1   5   1   3 137   3]
 [  0   1   1   0   2   1   0   3   1 168]]
Accuracy :  0.899850523169
Error :  0.100149476831

The next post is a conclusion on neural network.
Link to the next post : https://statinfer.com/204-5-13-neural-networks-conclusion/

21st June 2017

204.5.12 Practice : Digit Recognizer

Implementing Neural Network on digit image data.

Practice : Digit Recognizer

Statinfer

Statinfer

Statinfer

204.5.12 Practice : Digit Recognizer

Implementing Neural Network on digit image data.

Practice : Digit Recognizer

Related Courses

Tableau (Batch6)

Statinfer

PowerBI (Batch6)

Statinfer

SQL (Batch6)

Statinfer