Welcome to this Blog series on Neural Networks. In this series 204.5 we will go from basics of neural networks to build a neural network model that recognizes digit images and reads them correctly.

In this post, we will just revise our understanding of how logistic regression works, which can be considered a building block for a neural network.

Recap of Logistic Regression

Categorical output YES/NO type.
Using the predictor variables to predict the categorical output.

Practice : Logistic Regression

Dataset: Emp_Productivity/Emp_Productivity.csv
Filter the data and take a subset from above dataset . Filter condition is Sample_Set<3.
Draw a scatter plot that shows Age on X axis and Experience on Y-axis. Try to distinguish, the two classes with colors or shapes (visualizing the classes).
Build a logistic regression model to predict Productivity using age and experience.
Finally, draw the decision boundary for this logistic regression model.
Create, the confusion matrix.
Calculate, the accuracy and error rates.

Solution

In [1]:

import pandas as pd
Emp_Productivity_raw = pd.read_csv("datasets\\Emp_Productivity\\Emp_Productivity.csv")
Emp_Productivity_raw.head(10)

Out[1]:

	Age	Experience	Sample_Set
0	20.0	2.3	1
1	16.2	2.2	1
2	20.2	1.8	1
3	18.8	1.4	1
4	18.9	3.2	1
5	16.7	3.9	1
6	16.3	1.4	1
7	20.0	1.4	1
8	18.0	3.6	1
9	21.2	4.3	1

In [2]:

#Filter the data and take a subset from above dataset . Filter condition is Sample_Set<3
Emp_Productivity1=Emp_Productivity_raw[Emp_Productivity_raw.Sample_Set<3]
Emp_Productivity1.shape

Out[2]:

(74, 4)

In [3]:

#frequency table of Productivity variable
Emp_Productivity1.Productivity.value_counts()

Out[3]:

1    41
0    33
Name: Productivity, dtype: int64

In [4]:

####The clasification graph
#Draw a scatter plot that shows Age on X axis and Experience on Y-axis. Try to distinguish the two classes with colors or shapes.
import matplotlib.pyplot as plt
%matplotlib inline

fig = plt.figure()
ax1 = fig.add_subplot(111)

ax1.scatter(Emp_Productivity1.Age[Emp_Productivity1.Productivity==0],Emp_Productivity1.Experience[Emp_Productivity1.Productivity==0], s=10, c='b', marker="o", label='Productivity 0')
ax1.scatter(Emp_Productivity1.Age[Emp_Productivity1.Productivity==1],Emp_Productivity1.Experience[Emp_Productivity1.Productivity==1], s=10, c='r', marker="+", label='Productivity 1')
plt.legend(loc='upper left');
plt.show()

In [5]:

#predict Productivity using age and experience
###Logistic Regerssion model1
import statsmodels.formula.api as sm
model1 = sm.logit(formula='Productivity ~ Age+Experience', data=Emp_Productivity1)
fitted1 = model1.fit()
fitted1.summary()

Optimization terminated successfully.
         Current function value: 0.315987
         Iterations 7

Out[5]:

Logit Regression Results
Dep. Variable:	Productivity	No. Observations:	74
Model:	Logit	Df Residuals:	71
Method:	MLE	Df Model:	2
Date:	Tue, 15 Nov 2016	Pseudo R-squ.:	0.5402
Time:	16:08:12	Log-Likelihood:	-23.383
converged:	True	LL-Null:	-50.860
		LLR p-value:	1.167e-12

	coef	std err	z	P>\|z\|	[95.0% Conf. Int.]
Intercept	-8.9361	2.061	-4.335	0.000	-12.976 -4.896
Age	0.2763	0.105	2.620	0.009	0.070 0.483
Experience	0.5923	0.298	1.988	0.047	0.008 1.176

In [6]:

#coefficients
coef=fitted1.normalized_cov_params
print(coef)

            Intercept       Age  Experience
Intercept    4.249138 -0.184321    0.030957
Age         -0.184321  0.011118   -0.017256
Experience   0.030957 -0.017256    0.088759

In [7]:

# getting slope and intercept of the line
slope1=coef.Intercept[1]/(-coef.Intercept[2])
intercept1=coef.Intercept[0]/(-coef.Intercept[2])
slope1
intercept1

Out[7]:

-137.26024805820899

In [8]:

#Finally draw the decision boundary for this logistic regression model
import matplotlib.pyplot as plt

fig = plt.figure()
ax1 = fig.add_subplot(111)

ax1.scatter(Emp_Productivity1.Age[Emp_Productivity1.Productivity==0],Emp_Productivity1.Experience[Emp_Productivity1.Productivity==0], s=10, c='b', marker="o", label='Productivity 0')
ax1.scatter(Emp_Productivity1.Age[Emp_Productivity1.Productivity==1],Emp_Productivity1.Experience[Emp_Productivity1.Productivity==1], s=10, c='r', marker="+", label='Productivity 1')
plt.legend(loc='upper left');

x_min, x_max = ax1.get_xlim()
ax1.plot([0, x_max], [intercept1, x_max*slope1+intercept1])
ax1.set_xlim([15,35])
ax1.set_ylim([0,10])
plt.show()

Accuracy of the model

In [9]:

#Predicting classes
predicted_values=fitted1.predict(Emp_Productivity1[["Age"]+["Experience"]])
predicted_values[1:10]

threshold=0.5
threshold

import numpy as np
predicted_class=np.zeros(predicted_values.shape)
predicted_class[predicted_values>threshold]=1

predicted_class

Out[9]:

array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

In [10]:

#Confusion Matrix, Accuracy and Error
from sklearn.metrics import confusion_matrix as cm
ConfusionMatrix = cm(Emp_Productivity1[['Productivity']],predicted_class)
print('Confusion Matrix :', ConfusionMatrix)
accuracy=(ConfusionMatrix[0,0]+ConfusionMatrix[1,1])/sum(sum(ConfusionMatrix))
print('Accuracy : ',accuracy)
error=1-accuracy
print('Error: ',error)

Confusion Matrix : [[31  2]
 [ 2 39]]
Accuracy :  0.945945945946
Error:  0.0540540540541

The next post is about decision boundary logistic regression.
Link to the next post: https://statinfer.com/204-5-2-decision-boundary-logistic-regression/

21st June 2017

204.5.1 Neural Networks : A Recap of Logistic Regression

Before we dig into Neural Networks.

Recap of Logistic Regression

Practice : Logistic Regression

Solution

Statinfer

Statinfer

Statinfer

204.5.1 Neural Networks : A Recap of Logistic Regression

Before we dig into Neural Networks.

Recap of Logistic Regression

Practice : Logistic Regression

Solution

Related Courses

Python(Batch6)

Statinfer

Tableau (Batch6)

Statinfer

PowerBI (Batch6)

Statinfer