• No products in the cart.

203.5.2 Decision Boundary – Logistic Regression

Decision Boundary

Decision Boundary

In previous section, we studied about Neural Networks : A Recap of Logistic Regression

Decision Boundary – Logistic Regression

  • The line or margin that separates the classes
  • Classification algorithms are all about finding the decision boundaries
  • It need not be straight line always
  • The final function of our decision boundary looks like
    • Y=1 if \(w^Tx+w_0>0\) ; else Y=0
  • In logistic regression, it can be derived from the logistic regression coefficients and the threshold.
    • Imagine the logistic regression line p(y)=\(e^(b_0+b_1x_1+b_2x_2)/1+exp^(b_0+b_1x_1+b_2x_2)\)
    • Suppose if p(y)>0.5 then class-1 or else class-0
      • \(log(y/1-y)=b_0+b_1x_1+b_2x_2\)
      • \(Log(0.5/0.5)=b_0+b_1x_1+b_2x_2\)
      • \(0=b_0+b_1x_1+b_2x_2\)
      • \(b_0+b_1x_1+b_2x_2=0 is the line\)
    • Rewriting it in mx+c form
      • \(X_2=(-b_1/b_2)X_1+(-b_0/b_2)\)
    • Anything above this line is class-1, below this line is class-0
      • \(X_2>(-b_1/b_2)X_1+(-b_0/b_2)\) is class-1
      • \(X_2<(-b_1/b_2)X_1+(-b_0/b_2)\) is class-0
      • \(X_2=(-b_1/b_2)X_1+(-b_0/b_2)\) tie probability of 0.5
    • We can change the decision boundary by changing the threshold value(here 0.5)

LAB: Decision Boundary

  • Draw a scatter plot that shows Age on X axis and Experience on Y-axis. Try to distinguish the two classes with colors or shapes (visualizing the classes)
  • Build a logistic regression model to predict Productivity using age and experience
  • Finally draw the decision boundary for this logistic regression model
  • Create the confusion matrix
  • Calculate the accuracy and error rates

Solution

  • Drawing the Decision boundary for the logistic regression model
library(ggplot2)
base<-ggplot(Emp_Productivity1)+geom_point(aes(x=Age,y=Experience,color=factor(Productivity),shape=factor(Productivity)),size=5)
base+geom_abline(intercept = intercept1 , slope = slope1, color = "red", size = 2) 

#Base is the scatter plot. Then we are adding the decision boundary
  • Accuracy of the model1
predicted_values<-round(predict(Emp_Productivity_logit,type="response"),0)
conf_matrix<-table(predicted_values,Emp_Productivity_logit$y)
conf_matrix
##                 
## predicted_values  0  1
##                0 31  2
##                1  2 39
accuracy<-(conf_matrix[1,1]+conf_matrix[2,2])/(sum(conf_matrix))
accuracy
## [1] 0.9459459

New Representation for Logistic Regression

\[y=\frac{e^(b_0+b_1x_1+b_2x_2)}{1+e^(b_0+b_1x_1+b_2x_2)}\] \[y=\frac{1}{1+e^-(b_0+b_1x_1+b_2x_2)}\] \[y=g(w_0+w_1x_1+w_2x_2) where g(x)=\frac{1}{1+e^-(x)}\] \[y=g(\sum w_kx_k)\]

Finding the weights in logistic regression

out(x) = \(y=g(\sum w_kx_k)\)

The above output is a non linear function of linear combination of inputs – A typical multiple logistic regression line

We find w to minimize \(\sum_{i=1}^n [y_i – g(\sum w_kx_k)]^2\)

The next post is a practice session on Non Linear Decision Boundary.

Statinfer

Statinfer derived from Statistical inference. We provide training in various Data Analytics and Data Science courses and assist candidates in securing placements.

Contact Us

info@statinfer.com

+91- 9676098897

+91- 9494762485

 

Our Social Links

top
© 2020. All Rights Reserved.