• No products in the cart.

204.2.1 Logistic Regression, why do we need it?

In this series we will try to explore Logistic Regression Models. For the starters we will do a recap of Linear Regression and see if it works all the time.

Practice : What is the need of logistic regression?

  • Dataset: Product Sales Data/Product_sales.csv
  • What are the variables in the dataset?
  • Build a predictive model for Bought vs Age
  • What is R-Square?
  • If Age is 4 then will that customer buy the product?
  • If Age is 105 then will that customer buy the product?
In [2]:
import pandas as pd
sales=pd.read_csv("datasets\\Product Sales Data\\Product_sales.csv")
In [3]:
#What are the variables in the dataset? 
sales.columns.values
Out[3]:
array(['Age', 'Bought'], dtype=object)
In [4]:
#Build a predictive model for Bought vs Age

### we need to use the statsmodels package, which enables many statistical methods to be used in Python
import statsmodels.formula.api as sm
from statsmodels.formula.api import ols
model = sm.ols(formula='Bought ~ Age', data=sales)
fitted = model.fit()
fitted.summary()
Out[4]:
OLS Regression Results
Dep. Variable: Bought R-squared: 0.842
Model: OLS Adj. R-squared: 0.842
Method: Least Squares F-statistic: 2480.
Date: Sun, 16 Oct 2016 Prob (F-statistic): 1.63e-188
Time: 14:35:39 Log-Likelihood: 95.589
No. Observations: 467 AIC: -187.2
Df Residuals: 465 BIC: -178.9
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
Intercept -0.1704 0.015 -11.156 0.000 -0.200 -0.140
Age 0.0209 0.000 49.803 0.000 0.020 0.022
Omnibus: 77.279 Durbin-Watson: 1.362
Prob(Omnibus): 0.000 Jarque-Bera (JB): 1022.092
Skew: 0.056 Prob(JB): 1.14e-222
Kurtosis: 10.247 Cond. No. 60.7
In [5]:
#What is R-Square?
fitted.rsquared
Out[5]:
0.84212439295277375
In [6]:
#If Age is 4 then will that customer buy the product?

from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(sales[["Age"]], sales[["Bought"]])

age1=4
predict1=lr.predict(age1)
predict1
Out[6]:
array([[-0.08664394]])
In [7]:
age2=105
predict2=lr.predict(age2)
predict2
Out[7]:
array([[ 2.02851132]])

Something went wrong

  • The model that we built above is not right.
  • There is certain issues with the type of dependent variable.
  • The dependent variable is not continuous it is binary.
  • We can’t fit a linear regression line to this data.

Why not linear ?

  • Consider Product sales data. The dataset has two columns.
    • Age – continuous variable between 6-80
    • Buy(0- Yes ; 1-No)

Real-life examples

  • Gaming – Win vs. Loss
  • Sales – Buying vs. Not buying
  • Marketing – Response vs. No Response
  • Credit card & Loans – Default vs. Non Default
  • Operations – Attrition vs. Retention
  • Websites – Click vs. No click
  • Fraud identification – Fraud vs. Non Fraud
  • Healthcare – Cure vs. No Cure

The output of these non linear functions cannot be justifies with a linear model.

Some Nonlinear Functions

0 responses on "204.2.1 Logistic Regression, why do we need it?"

Leave a Message

Statinfer

Statinfer derived from Statistical inference is a company that focuses on the data science training and R&D.We offer training on Machine Learning, Deep Learning and Artificial Intelligence using tools like R, Python and TensorFlow

Contact Us

We Accept

Our Social Links

How to Become a Data Scientist?

top
© 2020. All Rights Reserved.