• No products in the cart.

# 204.5.5 Practice : Implementing Intermediate outputs in Python

Link to the previous post : https://statinfer.com/204-5-4-issue-with-non-linear-decision-boundary/

In this post, we will learn how to implement the concept of intermediate outputs using python. We will cover many things in this session.

### Practice : Intermediate output

• Dataset: Emp_Productivity/ Emp_Productivity_All_Sites.csv
• Filter the data and take first 74 observations from above dataset . Filter condition is Sample_Set<3
• Build a logistic regression model to predict Productivity using age and experience.
• Calculate the prediction probabilities for all the inputs. Store the probabilities in inter1 variable.
• Filter the data and take observations from row 34 onwards. Filter condition is Sample_Set<1
• Build a logistic regression model to predict Productivity using age and experience.
• Calculate the prediction probabilities for all the inputs. Store the probabilities in inter2 variable.
• Build a consolidated model to predict productivity using inter-1 and inter-2 variables.
• Create the confusion matrix and find the accuracy and error rates for the consolidated model.

Our sampled data Emp_Productivity1 has first 74 observations. Lets build the model on this sample data(sample-1)

In [20]:
```#Filter the data and take a subset from whole dataset . Filter condition is Sample_Set<3
Emp_Productivity1=Emp_Productivity_raw[Emp_Productivity_raw.Sample_Set<3]
Emp_Productivity1.shape
```
Out[20]:
`(74, 4)`
In [21]:
```#Building a Logistic regression model1 to predict Productivity using age and experience
import statsmodels.formula.api as sm
model1 = sm.logit(formula='Productivity ~ Age+Experience', data=Emp_Productivity1)
fitted1 = model1.fit()
fitted1.summary()
```
```Optimization terminated successfully.
Current function value: 0.315987
Iterations 7
```
Out[21]:
Dep. Variable: No. Observations: Productivity 74 Logit 71 MLE 2 Tue, 15 Nov 2016 0.5402 16:09:16 -23.383 True -50.86 1.167e-12
coef std err z P>|z| [95.0% Conf. Int.] -8.9361 2.061 -4.335 0.000 -12.976 -4.896 0.2763 0.105 2.620 0.009 0.070 0.483 0.5923 0.298 1.988 0.047 0.008 1.176
In [22]:
```#Drawing the decision boundary for this logistic regression model
import matplotlib.pyplot as plt

fig = plt.figure()

ax1.scatter(Emp_Productivity1.Age[Emp_Productivity1.Productivity==0],Emp_Productivity1.Experience[Emp_Productivity1.Productivity==0], s=10, c='b', marker="o", label='Productivity 0')
ax1.scatter(Emp_Productivity1.Age[Emp_Productivity1.Productivity==1],Emp_Productivity1.Experience[Emp_Productivity1.Productivity==1], s=10, c='r', marker="+", label='Productivity 1')

plt.xlim(min(Emp_Productivity1.Age), max(Emp_Productivity1.Age))
plt.ylim(min(Emp_Productivity1.Experience), max(Emp_Productivity1.Experience))
plt.legend(loc='upper left');

x_min, x_max = ax1.get_xlim()
ax1.plot([0, x_max], [intercept1, x_max*slope1+intercept1])
plt.show()
```
In [23]:
```# Calculating and Storing prediction probabilities in inter1 variable for data Emp_Productivity1
Emp_Productivity_raw['inter1'] = fitted1.predict(Emp_Productivity_raw[["Age"]+["Experience"]])
```

For Sample_Set < 1 :

In [24]:
```# Filter the data and take observations from row 34 onwards. Filter condition is Sample_Set<1
Emp_Productivity2=Emp_Productivity_raw[Emp_Productivity_raw.Sample_Set>1]
Emp_Productivity2.shape
```
Out[24]:
`(86, 5)`
In [25]:
```Emp_Productivity2.head()
```
Out[25]:
Age Experience Productivity Sample_Set inter1
33 33.9 6.2 1 2 0.983732
34 29.3 5.5 1 2 0.918087
35 27.8 3.4 1 2 0.680985
36 30.7 8.6 1 2 0.990432
37 28.4 8.2 1 2 0.977408
In [26]:
```####The clasification graph
#Draw a scatter plot that shows Age on X axis and Experience on Y-axis. Try to distinguish the two classes with colors or shapes.
fig = plt.figure()

ax2.scatter(Emp_Productivity2.Age[Emp_Productivity2.Productivity==0],Emp_Productivity2.Experience[Emp_Productivity2.Productivity==0], s=10, c='b', marker="o", label='Productivity 0')
ax2.scatter(Emp_Productivity2.Age[Emp_Productivity2.Productivity==1],Emp_Productivity2.Experience[Emp_Productivity2.Productivity==1], s=10, c='r', marker="+", label='Productivity 1')
plt.xlim(min(Emp_Productivity2.Age), max(Emp_Productivity2.Age))
plt.ylim(min(Emp_Productivity2.Experience), max(Emp_Productivity2.Experience))
plt.legend(loc='upper left');
plt.show()
```
In [27]:
```#Build a logistic regression model to predict Productivity using age and experience of data Emp_Productivity2
import statsmodels.formula.api as sm
model2 = sm.logit(formula='Productivity ~ Age+Experience', data=Emp_Productivity2)
fitted2 = model2.fit(method="bfgs")
fitted2.summary()
```
```Optimization terminated successfully.
Current function value: 0.198139
Iterations: 25
Function evaluations: 28
```
Out[27]:
Dep. Variable: No. Observations: Productivity 86 Logit 83 MLE 2 Tue, 15 Nov 2016 0.7137 16:09:35 -17.04 True -59.518 3.566e-19
coef std err z P>|z| [95.0% Conf. Int.] 16.3184 3.966 4.114 0.000 8.545 24.092 -0.3994 0.135 -2.949 0.003 -0.665 -0.134 -0.2440 0.189 -1.288 0.198 -0.615 0.127
In [28]:
```#coefficients
coef=fitted2.normalized_cov_params
print(coef)
#getting slope and intercept of the line
slope2=fitted2.params[1]/(-fitted2.params[2])
intercept2=fitted2.params[0]/(-fitted2.params[2])
```
```            Intercept       Age  Experience
Intercept   15.730481 -0.497397    0.183390
Age         -0.497397  0.018339   -0.014873
Experience   0.183390 -0.014873    0.035860
```
In [29]:
```#Finally draw the decision boundary for this logistic regression model
import matplotlib.pyplot as plt

fig = plt.figure()

ax2.scatter(Emp_Productivity2.Age[Emp_Productivity2.Productivity==0],Emp_Productivity2.Experience[Emp_Productivity2.Productivity==0], s=10, c='b', marker="o", label='Productivity 0')
ax2.scatter(Emp_Productivity2.Age[Emp_Productivity2.Productivity==1],Emp_Productivity2.Experience[Emp_Productivity2.Productivity==1], s=10, c='r', marker="+", label='Productivity 1')
plt.xlim(min(Emp_Productivity2.Age), max(Emp_Productivity2.Age))
plt.ylim(min(Emp_Productivity2.Experience), max(Emp_Productivity2.Experience))
plt.legend(loc='upper left');

x_min, x_max = ax2.get_xlim()
y_min,y_max=ax2.get_ylim()
ax2.plot([x_min, x_max], [x_min*slope2+intercept2, x_max*slope2+intercept2])
plt.show()
```
In [30]:
```#Calculate the prediction probabilities for all the inputs. Store the probabilities in inter2 variable for data Emp_Productivity2
Emp_Productivity_raw['inter2']=fitted2.predict(Emp_Productivity_raw[["Age"]+["Experience"]])
```
In [31]:
```###Confusion matrix, Accuracy and error of the model2

#Predciting Values
predicted_values=fitted2.predict(Emp_Productivity2[["Age"]+["Experience"]])
predicted_values[1:10]

#Lets convert them to classes using a threshold
threshold=0.5
threshold

import numpy as np
predicted_class=np.zeros(predicted_values.shape)
predicted_class[predicted_values>threshold]=1

#Predcited Classes
predicted_class[1:10]

from sklearn.metrics import confusion_matrix as cm
ConfusionMatrix = cm(Emp_Productivity2[['Productivity']],predicted_class)
print('Confusion Matrix : ',ConfusionMatrix)
accuracy=(ConfusionMatrix[0,0]+ConfusionMatrix[1,1])/sum(sum(ConfusionMatrix))
print('Accuracy : ', accuracy)

error=1-accuracy
print('Error : ', error)
```
```Confusion Matrix :  [[43  2]
[ 2 39]]
Accuracy :  0.953488372093
Error :  0.046511627907
```

Now that both models have been created lets try to combine them.

In [32]:
```#plotting the new columns
import matplotlib.pyplot as plt

fig = plt.figure()

ax.scatter(Emp_Productivity_raw.inter1[Emp_Productivity_raw.Productivity==0],Emp_Productivity_raw.inter2[Emp_Productivity_raw.Productivity==0], s=50, c='b', marker="o", label='Productivity 0')
ax.scatter(Emp_Productivity_raw.inter1[Emp_Productivity_raw.Productivity==1],Emp_Productivity_raw.inter2[Emp_Productivity_raw.Productivity==1], s=50, c='r', marker="+", label='Productivity 1')

plt.xlim(min(Emp_Productivity_raw.inter1), max(Emp_Productivity_raw.inter1)+0.2)
plt.ylim(min(Emp_Productivity_raw.inter2), max(Emp_Productivity_raw.inter2)+0.2)

plt.legend(loc='lower left');
plt.show()
```
In [33]:
```###Logistic Regerssion model with Intermediate outputs as input
import statsmodels.formula.api as sm

model_combined = sm.logit(formula='Productivity ~ inter1+inter2', data=Emp_Productivity_raw)
fitted_combined = model_combined.fit(method="bfgs")
fitted_combined.summary()
```
```Optimization terminated successfully.
Current function value: 0.208985
Iterations: 26
Function evaluations: 27
```
Out[33]:
Dep. Variable: No. Observations: Productivity 119 Logit 116 MLE 2 Tue, 15 Nov 2016 0.6805 16:09:54 -24.869 True -77.848 9.805e-24
coef std err z P>|z| [95.0% Conf. Int.] -12.2134 1.907 -6.405 0.000 -15.951 -8.476 8.0193 1.409 5.693 0.000 5.258 10.780 8.5983 1.509 5.697 0.000 5.640 11.556
In [34]:
```#coefficients
coef=fitted_combined.normalized_cov_params
print(coef)
# getting slope and intercept of the line
slope_combined=fitted_combined.params[1]/(-fitted_combined.params[2])
intercept_combined=fitted_combined.params[0]/(-fitted_combined.params[2])
```
```           Intercept    inter1    inter2
Intercept   3.635572 -2.326774 -2.637054
inter1     -2.326774  1.984538  1.413297
inter2     -2.637054  1.413297  2.277541
```
In [35]:
```#Finally draw the decision boundary for this logistic regression model
import matplotlib.pyplot as plt

fig = plt.figure()

ax2.scatter(Emp_Productivity_raw.inter1[Emp_Productivity_raw.Productivity==0],Emp_Productivity_raw.inter2[Emp_Productivity_raw.Productivity==0], s=10, c='b', marker="o", label='Productivity 0')
ax2.scatter(Emp_Productivity_raw.inter1[Emp_Productivity_raw.Productivity==1],Emp_Productivity_raw.inter2[Emp_Productivity_raw.Productivity==1], s=10, c='r', marker="+", label='Productivity 1')

plt.xlim(min(Emp_Productivity_raw.inter1), max(Emp_Productivity_raw.inter1)+0.2)
plt.ylim(min(Emp_Productivity_raw.inter2), max(Emp_Productivity_raw.inter2)+0.2)

plt.legend(loc='lower left');

x_min, x_max = ax2.get_xlim()
y_min,y_max=ax2.get_ylim()
ax2.plot([x_min, x_max], [x_min*slope_combined+intercept_combined, x_max*slope_combined+intercept_combined])
plt.show()
```
In [36]:
```#### Confusion Matrix, Accuracy and Error of the Intermediate
#Predciting Values
predicted_values=fitted_combined.predict(Emp_Productivity_raw[["inter1"]+["inter2"]])
predicted_values[1:10]

#Lets convert them to classes using a threshold
threshold=0.5
threshold

import numpy as np
predicted_class=np.zeros(predicted_values.shape)
predicted_class[predicted_values>threshold]=1

#Predcited Classes
predicted_class[1:10]

from sklearn.metrics import confusion_matrix as cm
ConfusionMatrix = cm(Emp_Productivity_raw[['Productivity']],predicted_class)
print('Confusion Matrix : ',ConfusionMatrix)

accuracy=(ConfusionMatrix[0,0]+ConfusionMatrix[1,1])/sum(sum(ConfusionMatrix))
print('Accuracy : ', accuracy)

error=1-accuracy
print('Error : ', error)
```
```Confusion Matrix :  [[74  2]
[ 4 39]]
Accuracy :  0.949579831933
Error :  0.0504201680672
```

We got an accuracy of 94.95% with an Intermediate model.

Next post is about neural network intuition.

Link to the next post : https://statinfer.com/204-5-6-neural-network-intuition/

0 responses on "204.5.5 Practice : Implementing Intermediate outputs in Python"