Classification algorithms are all about finding the decision boundaries
It need not be straight line always
The final function of our decision boundary looks like
Y=1 if \(w^Tx+w_0>0\) ; else Y=0
In logistic regression, it can be derived from the logistic regression coefficients and the threshold.
Imagine the logistic regression line p(y)=\(e^(b_0+b_1x_1+b_2x_2)/1+exp^(b_0+b_1x_1+b_2x_2)\)
Suppose if p(y)>0.5 then class-1 or else class-0
\(log(y/1-y)=b_0+b_1x_1+b_2x_2\)
\(Log(0.5/0.5)=b_0+b_1x_1+b_2x_2\)
\(0=b_0+b_1x_1+b_2x_2\)
\(b_0+b_1x_1+b_2x_2=0 is the line\)
Rewriting it in mx+c form
\(X_2=(-b_1/b_2)X_1+(-b_0/b_2)\)
Anything above this line is class-1, below this line is class-0
\(X_2>(-b_1/b_2)X_1+(-b_0/b_2)\) is class-1
\(X_2<(-b_1/b_2)X_1+(-b_0/b_2)\) is class-0
\(X_2=(-b_1/b_2)X_1+(-b_0/b_2)\) tie probability of 0.5
We can change the decision boundary by changing the threshold value(here 0.5)
LAB: Decision Boundary
Draw a scatter plot that shows Age on X axis and Experience on Y-axis. Try to distinguish the two classes with colors or shapes (visualizing the classes)
Build a logistic regression model to predict Productivity using age and experience
Finally draw the decision boundary for this logistic regression model
Create the confusion matrix
Calculate the accuracy and error rates
Solution
Drawing the Decision boundary for the logistic regression model
\[y=\frac{e^(b_0+b_1x_1+b_2x_2)}{1+e^(b_0+b_1x_1+b_2x_2)}\]\[y=\frac{1}{1+e^-(b_0+b_1x_1+b_2x_2)}\]\[y=g(w_0+w_1x_1+w_2x_2) where g(x)=\frac{1}{1+e^-(x)}\]\[y=g(\sum w_kx_k)\]
Finding the weights in logistic regression
out(x) = \(y=g(\sum w_kx_k)\)
The above output is a non linear function of linear combination of inputs – A typical multiple logistic regression line
We find w to minimize \(\sum_{i=1}^n [y_i – g(\sum w_kx_k)]^2\)