In this series we will try to explore Logistic Regression Models. For the starters we will do a recap of Linear Regression and see if it works all the time.
Practice : What is the need of logistic regression?
- Dataset: Product Sales Data/Product_sales.csv
- What are the variables in the dataset?
- Build a predictive model for Bought vs Age
- What is R-Square?
- If Age is 4 then will that customer buy the product?
- If Age is 105 then will that customer buy the product?
In [2]:
import pandas as pd
sales=pd.read_csv("datasets\\Product Sales Data\\Product_sales.csv")
In [3]:
#What are the variables in the dataset?
sales.columns.values
Out[3]:
In [4]:
#Build a predictive model for Bought vs Age
### we need to use the statsmodels package, which enables many statistical methods to be used in Python
import statsmodels.formula.api as sm
from statsmodels.formula.api import ols
model = sm.ols(formula='Bought ~ Age', data=sales)
fitted = model.fit()
fitted.summary()
Out[4]:
In [5]:
#What is R-Square?
fitted.rsquared
Out[5]:
In [6]:
#If Age is 4 then will that customer buy the product?
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(sales[["Age"]], sales[["Bought"]])
age1=4
predict1=lr.predict(age1)
predict1
Out[6]:
In [7]:
age2=105
predict2=lr.predict(age2)
predict2
Out[7]:
Something went wrong
- The model that we built above is not right.
- There is certain issues with the type of dependent variable.
- The dependent variable is not continuous it is binary.
- We can’t fit a linear regression line to this data.
Why not linear ?
- Consider Product sales data. The dataset has two columns.
- Age – continuous variable between 6-80
- Buy(0- Yes ; 1-No)
Real-life examples
- Gaming – Win vs. Loss
- Sales – Buying vs. Not buying
- Marketing – Response vs. No Response
- Credit card & Loans – Default vs. Non Default
- Operations – Attrition vs. Retention
- Websites – Click vs. No click
- Fraud identification – Fraud vs. Non Fraud
- Healthcare – Cure vs. No Cure
The output of these non linear functions cannot be justifies with a linear model.