• LOGIN
  • No products in the cart.

204.5.1 Neural Networks : A Recap of Logistic Regression

Before we dig into Neural Networks.

Welcome to this Blog series on Neural Networks. In this series 204.5 we will go from basics of neural networks to build a neural network model that recognizes digit images and reads them correctly.

In this post, we will just revise our understanding of how logistic regression works, which can be considered a building block for a neural network.

Recap of Logistic Regression

  • Categorical output YES/NO type.
  • Using the predictor variables to predict the categorical output.

Practice : Logistic Regression

  • Dataset: Emp_Productivity/Emp_Productivity.csv
  • Filter the data and take a subset from above dataset . Filter condition is Sample_Set<3.
  • Draw a scatter plot that shows Age on X axis and Experience on Y-axis. Try to distinguish, the two classes with colors or shapes (visualizing the classes).
  • Build a logistic regression model to predict Productivity using age and experience.
  • Finally, draw the decision boundary for this logistic regression model.
  • Create, the confusion matrix.
  • Calculate, the accuracy and error rates.

Solution

In [1]:
import pandas as pd
Emp_Productivity_raw = pd.read_csv("datasets\\Emp_Productivity\\Emp_Productivity.csv")
Emp_Productivity_raw.head(10)
Out[1]:
Age Experience Productivity Sample_Set
0 20.0 2.3 0 1
1 16.2 2.2 0 1
2 20.2 1.8 0 1
3 18.8 1.4 0 1
4 18.9 3.2 0 1
5 16.7 3.9 0 1
6 16.3 1.4 0 1
7 20.0 1.4 0 1
8 18.0 3.6 0 1
9 21.2 4.3 0 1
In [2]:
#Filter the data and take a subset from above dataset . Filter condition is Sample_Set<3
Emp_Productivity1=Emp_Productivity_raw[Emp_Productivity_raw.Sample_Set<3]
Emp_Productivity1.shape
Out[2]:
(74, 4)
In [3]:
#frequency table of Productivity variable
Emp_Productivity1.Productivity.value_counts()
Out[3]:
1    41
0    33
Name: Productivity, dtype: int64
In [4]:
####The clasification graph
#Draw a scatter plot that shows Age on X axis and Experience on Y-axis. Try to distinguish the two classes with colors or shapes.
import matplotlib.pyplot as plt
%matplotlib inline

fig = plt.figure()
ax1 = fig.add_subplot(111)

ax1.scatter(Emp_Productivity1.Age[Emp_Productivity1.Productivity==0],Emp_Productivity1.Experience[Emp_Productivity1.Productivity==0], s=10, c='b', marker="o", label='Productivity 0')
ax1.scatter(Emp_Productivity1.Age[Emp_Productivity1.Productivity==1],Emp_Productivity1.Experience[Emp_Productivity1.Productivity==1], s=10, c='r', marker="+", label='Productivity 1')
plt.legend(loc='upper left');
plt.show()
In [5]:
#predict Productivity using age and experience
###Logistic Regerssion model1
import statsmodels.formula.api as sm
model1 = sm.logit(formula='Productivity ~ Age+Experience', data=Emp_Productivity1)
fitted1 = model1.fit()
fitted1.summary()
Optimization terminated successfully.
         Current function value: 0.315987
         Iterations 7
Out[5]:
Logit Regression Results
Dep. Variable: Productivity No. Observations: 74
Model: Logit Df Residuals: 71
Method: MLE Df Model: 2
Date: Tue, 15 Nov 2016 Pseudo R-squ.: 0.5402
Time: 16:08:12 Log-Likelihood: -23.383
converged: True LL-Null: -50.860
LLR p-value: 1.167e-12
coef std err z P>|z| [95.0% Conf. Int.]
Intercept -8.9361 2.061 -4.335 0.000 -12.976 -4.896
Age 0.2763 0.105 2.620 0.009 0.070 0.483
Experience 0.5923 0.298 1.988 0.047 0.008 1.176
In [6]:
#coefficients
coef=fitted1.normalized_cov_params
print(coef)
            Intercept       Age  Experience
Intercept    4.249138 -0.184321    0.030957
Age         -0.184321  0.011118   -0.017256
Experience   0.030957 -0.017256    0.088759
In [7]:
# getting slope and intercept of the line
slope1=coef.Intercept[1]/(-coef.Intercept[2])
intercept1=coef.Intercept[0]/(-coef.Intercept[2])
slope1
intercept1
Out[7]:
-137.26024805820899
In [8]:
#Finally draw the decision boundary for this logistic regression model
import matplotlib.pyplot as plt

fig = plt.figure()
ax1 = fig.add_subplot(111)

ax1.scatter(Emp_Productivity1.Age[Emp_Productivity1.Productivity==0],Emp_Productivity1.Experience[Emp_Productivity1.Productivity==0], s=10, c='b', marker="o", label='Productivity 0')
ax1.scatter(Emp_Productivity1.Age[Emp_Productivity1.Productivity==1],Emp_Productivity1.Experience[Emp_Productivity1.Productivity==1], s=10, c='r', marker="+", label='Productivity 1')
plt.legend(loc='upper left');

x_min, x_max = ax1.get_xlim()
ax1.plot([0, x_max], [intercept1, x_max*slope1+intercept1])
ax1.set_xlim([15,35])
ax1.set_ylim([0,10])
plt.show()
  • Accuracy of the model
In [9]:
#Predicting classes
predicted_values=fitted1.predict(Emp_Productivity1[["Age"]+["Experience"]])
predicted_values[1:10]

threshold=0.5
threshold

import numpy as np
predicted_class=np.zeros(predicted_values.shape)
predicted_class[predicted_values>threshold]=1

predicted_class
Out[9]:
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
        1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])
In [10]:
#Confusion Matrix, Accuracy and Error
from sklearn.metrics import confusion_matrix as cm
ConfusionMatrix = cm(Emp_Productivity1[['Productivity']],predicted_class)
print('Confusion Matrix :', ConfusionMatrix)
accuracy=(ConfusionMatrix[0,0]+ConfusionMatrix[1,1])/sum(sum(ConfusionMatrix))
print('Accuracy : ',accuracy)
error=1-accuracy
print('Error: ',error)
Confusion Matrix : [[31  2]
 [ 2 39]]
Accuracy :  0.945945945946
Error:  0.0540540540541

The next post is about decision boundary logistic regression.
Link to the next post: https://statinfer.com/204-5-2-decision-boundary-logistic-regression/

0 responses on "204.5.1 Neural Networks : A Recap of Logistic Regression"

Leave a Message