Statinfer

203.5.5 Practice : Implementing Intermediate outputs in R

Putting intermediate outputs into practice.

In previous section, we studied about  Issue with Non Linear Decision Boundary

In this post we will learn how to implement the concept of intermediate outputs using R. We will cover many things in this session.

  • Dataset: Emp_Productivity/ Emp_Productivity_All_Sites.csv
  • Filter the data and take first 74 observations from above dataset . Filter condition is Sample_Set<3
  • Build a logistic regression model to predict Productivity using age and experience
  • Calculate the prediction probabilities for all the inputs. Store the probabilities in inter1 variable
  • Filter the data and take observations from row 34 onwards. Filter condition is Sample_Set<1
  • Build a logistic regression model to predict Productivity using age and experience
  • Calculate the prediction probabilities for all the inputs. Store the probabilities in inter2 variable
  • Build a consolidated model to predict productivity using inter-1 and inter-2 variables
  • Create the confusion matrix and find the accuracy and error rates for the consolidated model

Our sampled data Emp_Productivity1 has first 74 observations. Lets build the model on this sample data(sample-1)

####The clasification graph Sample-1
library(ggplot2)
ggplot(Emp_Productivity1)+geom_point(aes(x=Age,y=Experience,color=factor(Productivity),shape=factor(Productivity)),size=5)

###Logistic Regerssion model1
Emp_Productivity_logit<-glm(Productivity~Age+Experience,data=Emp_Productivity1, family=binomial())

coef(Emp_Productivity_logit)
## (Intercept)         Age  Experience 
##  -8.9361114   0.2762749   0.5923444
slope1 <- coef(Emp_Productivity_logit)[2]/(-coef(Emp_Productivity_logit)[3])
intercept1 <- coef(Emp_Productivity_logit)[1]/(-coef(Emp_Productivity_logit)[3]) 
####Decision boundary for model1 built on Sample-1
library(ggplot2)
base<-ggplot(Emp_Productivity1)+geom_point(aes(x=Age,y=Experience,color=factor(Productivity),shape=factor(Productivity)),size=5)
base+geom_abline(intercept = intercept1 , slope = slope1, color = "red", size = 2)

#Base is the scatter plot. Then we are adding the decision boundary
#Filter the data and take observations from row 34 onwards. 
Emp_Productivity2<-Emp_Productivity_raw[Emp_Productivity_raw$Sample_Set>1,]
####The clasification graph
library(ggplot2)
ggplot(Emp_Productivity2)+geom_point(aes(x=Age,y=Experience,color=factor(Productivity),shape=factor(Productivity)),size=5)

###Logistic Regerssion model2 built on Sample2
Emp_Productivity_logit2<-glm(Productivity~Age+Experience, data=Emp_Productivity2, family=binomial())
Emp_Productivity_logit2
## 
## Call:  glm(formula = Productivity ~ Age + Experience, family = binomial(), 
##     data = Emp_Productivity2)
## 
## Coefficients:
## (Intercept)          Age   Experience  
##     16.3184      -0.3994      -0.2440  
## 
## Degrees of Freedom: 85 Total (i.e. Null);  83 Residual
## Null Deviance:       119 
## Residual Deviance: 34.08     AIC: 40.08
coef(Emp_Productivity_logit2)
## (Intercept)         Age  Experience 
##  16.3183916  -0.3994172  -0.2439643
slope3 <- coef(Emp_Productivity_logit2)[2]/(-coef(Emp_Productivity_logit2)[3])
intercept3 <- coef(Emp_Productivity_logit2)[1]/(-coef(Emp_Productivity_logit2)[3]) 
####Drawing the Decison boundry
library(ggplot2)
base<-ggplot(Emp_Productivity2)+geom_point(aes(x=Age,y=Experience,color=factor(Productivity),shape=factor(Productivity)),size=5)
base+geom_abline(intercept = intercept3 , slope = slope3, color = "red", size = 2) 

####Accuracy of the model2
predicted_values<-round(predict(Emp_Productivity_logit2,type="response"),0)
conf_matrix<-table(predicted_values,Emp_Productivity_logit2$y)
conf_matrix
##                 
## predicted_values  0  1
##                0 43  2
##                1  2 39
accuracy<-(conf_matrix[1,1]+conf_matrix[2,2])/(sum(conf_matrix))
accuracy
## [1] 0.9534884
#Calculate the prediction probabilities for all the inputs. Store the probabilities in inter1 variable
Emp_Productivity_raw$inter1<-predict(Emp_Productivity_logit,type="response", newdata=Emp_Productivity_raw)

#Calculate the prediction probabilities for all the inputs. Store the probabilities in inter2 variable
Emp_Productivity_raw$inter2<-predict(Emp_Productivity_logit2,type="response", newdata=Emp_Productivity_raw)

head(Emp_Productivity_raw)
##    Age Experience Productivity Sample_Set     inter1    inter2
## 1 20.0        2.3            0          1 0.11423230 0.9995775
## 2 16.2        2.2            0          1 0.04080461 0.9999096
## 3 20.2        1.8            0          1 0.09202657 0.9995949
## 4 18.8        1.4            0          1 0.05152147 0.9997899
## 5 18.9        3.2            0          1 0.13955234 0.9996608
## 6 16.7        3.9            0          1 0.11793035 0.9998329
####Clasification graph with the two new coloumns
library(ggplot2)
ggplot(Emp_Productivity_raw)+geom_point(aes(x=inter1,y=inter2,color=factor(Productivity),shape=factor(Productivity)),size=5)

###Logistic Regerssion model with Intermediate outputs as input
Emp_Productivity_logit_combined<-glm(Productivity~inter1+inter2,data=Emp_Productivity_raw, family=binomial())
Emp_Productivity_logit_combined
## 
## Call:  glm(formula = Productivity ~ inter1 + inter2, family = binomial(), 
##     data = Emp_Productivity_raw)
## 
## Coefficients:
## (Intercept)       inter1       inter2  
##     -12.213        8.019        8.598  
## 
## Degrees of Freedom: 118 Total (i.e. Null);  116 Residual
## Null Deviance:       155.7 
## Residual Deviance: 49.74     AIC: 55.74
slope4 <- coef(Emp_Productivity_logit_combined)[2]/(-coef(Emp_Productivity_logit_combined)[3])
intercept4<- coef(Emp_Productivity_logit_combined)[1]/(-coef(Emp_Productivity_logit_combined)[3]) 

####Drawing the Decison boundry
library(ggplot2)
base<-ggplot(Emp_Productivity_raw)+geom_point(aes(x=inter1,y=inter2,color=factor(Productivity),shape=factor(Productivity)),size=5)
base+geom_abline(intercept = intercept4 , slope = slope4, colour = "red", size = 2) 

####Accuracy of the combined
predicted_values<-round(predict(Emp_Productivity_logit_combined,type="response"),0)
conf_matrix<-table(predicted_values,Emp_Productivity_logit_combined$y)
conf_matrix
##                 
## predicted_values  0  1
##                0 74  4
##                1  2 39
accuracy<-(conf_matrix[1,1]+conf_matrix[2,2])/(sum(conf_matrix))
accuracy
## [1] 0.9495798

We got an accuracy of 94.95% with an Intermediate model.

The next post is about Neural Network Intuition.

0 responses on "203.5.5 Practice : Implementing Intermediate outputs in R"

Leave a Message

Blog Posts

Hurry up!!!

"use coupon code for FLAT 30% discount"  datascientistoffer        ___________________________________      Subscribe to our youtube channel. Get access to video tutorials.                

Contact Us

Statinfer Software Solutions#647 2nd floor 1st Main, Indira Nagar 1st Stage, 100 feet road,Indranagar Bangalore,Karnataka, Pin code:-560038 Landmarks: Opp. Namma Metro Pillar 48.

Connect with us

linkin fn twitter g

How to become a Data Scientist.?

top