• No products in the cart.

204.2.1 Logistic Regression, why do we need it?

What happens if Linear Regression doesn't work?

In this series we will try to explore Logistic Regression Models. For the starters we will do a recap of Linear Regression and see if it works all the time.

Practice : What is the need of logistic regression?

  • Dataset: Product Sales Data/Product_sales.csv
  • What are the variables in the dataset?
  • Build a predictive model for Bought vs Age
  • What is R-Square?
  • If Age is 4 then will that customer buy the product?
  • If Age is 105 then will that customer buy the product?
In [2]:
import pandas as pd
sales=pd.read_csv("datasets\\Product Sales Data\\Product_sales.csv")
In [3]:
#What are the variables in the dataset? 
sales.columns.values
Out[3]:
array(['Age', 'Bought'], dtype=object)
In [4]:
#Build a predictive model for Bought vs Age

### we need to use the statsmodels package, which enables many statistical methods to be used in Python
import statsmodels.formula.api as sm
from statsmodels.formula.api import ols
model = sm.ols(formula='Bought ~ Age', data=sales)
fitted = model.fit()
fitted.summary()
Out[4]:
OLS Regression Results
Dep. Variable: Bought R-squared: 0.842
Model: OLS Adj. R-squared: 0.842
Method: Least Squares F-statistic: 2480.
Date: Sun, 16 Oct 2016 Prob (F-statistic): 1.63e-188
Time: 14:35:39 Log-Likelihood: 95.589
No. Observations: 467 AIC: -187.2
Df Residuals: 465 BIC: -178.9
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
Intercept -0.1704 0.015 -11.156 0.000 -0.200 -0.140
Age 0.0209 0.000 49.803 0.000 0.020 0.022
Omnibus: 77.279 Durbin-Watson: 1.362
Prob(Omnibus): 0.000 Jarque-Bera (JB): 1022.092
Skew: 0.056 Prob(JB): 1.14e-222
Kurtosis: 10.247 Cond. No. 60.7
In [5]:
#What is R-Square?
fitted.rsquared
Out[5]:
0.84212439295277375
In [6]:
#If Age is 4 then will that customer buy the product?

from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(sales[["Age"]], sales[["Bought"]])

age1=4
predict1=lr.predict(age1)
predict1
Out[6]:
array([[-0.08664394]])
In [7]:
age2=105
predict2=lr.predict(age2)
predict2
Out[7]:
array([[ 2.02851132]])

Something went wrong

  • The model that we built above is not right.
  • There is certain issues with the type of dependent variable.
  • The dependent variable is not continuous it is binary.
  • We can’t fit a linear regression line to this data.

Why not linear ?

  • Consider Product sales data. The dataset has two columns.
    • Age – continuous variable between 6-80
    • Buy(0- Yes ; 1-No)

Real-life examples

  • Gaming – Win vs. Loss
  • Sales – Buying vs. Not buying
  • Marketing – Response vs. No Response
  • Credit card & Loans – Default vs. Non Default
  • Operations – Attrition vs. Retention
  • Websites – Click vs. No click
  • Fraud identification – Fraud vs. Non Fraud
  • Healthcare – Cure vs. No Cure

The output of these non linear functions cannot be justifies with a linear model.

Some Nonlinear Functions

Statinfer

Statinfer derived from Statistical inference. We provide training in various Data Analytics and Data Science courses and assist candidates in securing placements.

Contact Us

info@statinfer.com

+91- 9676098897

+91- 9494762485

 

Our Social Links

top
© 2020. All Rights Reserved.