Link to the previous post : https://statinfer.com/204-2-1-logistic-regression-why-do-we-need-it/
In the last post we saw linear regression cannot be used if the final output is binary, yes or no. As it’s tough to fit a binary output on a linear function.
To solve this problem we can move toward some different kind of functions, a Logistic Function being the first choice.
A Logistic Function
This is how a Logistic Function look like:
The Logistic function
- We want a model that predicts probabilities between 0 and 1, that is, S-shaped.
- There are lots of s-shaped curves. We use the logistic model:
Probability=e(β0+β1X)1+e($β0+β1X)
Logistic Regression Output
- In logistic regression, we try to predict the probability instead of direct values.
- Y is binary, it takes only two values 1 and 0 instead of predicting 1 or 0 we predict the probability of 1 and probability of zero.
- This suits aptly for the binary categorical outputs like YES vs NO; WIN vs LOSS; Fraud vs Non Fraud.
Practice : Logistic Regression
- Dataset: Product Sales Data/Product_sales.csv
- Build a logistic Regression line between Age and buying
- A 4 years old customer, will he buy the product?
- If Age is 105 then will that customer buy the product?
In [8]:
import pandas as pd
sales=pd.read_csv("datasets\\Product Sales Data\\Product_sales.csv")
import statsmodels.formula.api as sm
# Build a logistic Regression line between Age and buying
logit=sm.Logit(sales['Bought'],sales['Age'])
logit
Out[8]:
In [9]:
result = logit.fit()
result
Out[9]:
In [10]:
result.summary()
Out[10]:
In [11]:
###coefficients Interval of each coefficient
print (result.conf_int())
In [12]:
#One more way of fitting the model
from sklearn.linear_model import LogisticRegression
logistic = LogisticRegression()
logistic.fit(sales[["Age"]],sales["Bought"])
Out[12]:
In [13]:
#A 4 years old customer, will he buy the product?
age1=4
predict_age1=logistic.predict(age1)
print(predict_age1)
In [14]:
#If Age is 105 then will that customer buy the product?
age2=105
predict_age2=logistic.predict(age2)
print(predict_age2)