In this series we will try to explore Logistic Regression Models. For the starters we will do a recap of Linear Regression and see if it works all the time.

### Practice : What is the need of logistic regression?

- Dataset: Product Sales Data/Product_sales.csv
- What are the variables in the dataset?
- Build a predictive model for Bought vs Age
- What is R-Square?
- If Age is 4 then will that customer buy the product?
- If Age is 105 then will that customer buy the product?

In [2]:

```
import pandas as pd
sales=pd.read_csv("datasets\\Product Sales Data\\Product_sales.csv")
```

In [3]:

```
#What are the variables in the dataset?
sales.columns.values
```

Out[3]:

In [4]:

```
#Build a predictive model for Bought vs Age
### we need to use the statsmodels package, which enables many statistical methods to be used in Python
import statsmodels.formula.api as sm
from statsmodels.formula.api import ols
model = sm.ols(formula='Bought ~ Age', data=sales)
fitted = model.fit()
fitted.summary()
```

Out[4]:

In [5]:

```
#What is R-Square?
fitted.rsquared
```

Out[5]:

In [6]:

```
#If Age is 4 then will that customer buy the product?
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(sales[["Age"]], sales[["Bought"]])
age1=4
predict1=lr.predict(age1)
predict1
```

Out[6]:

In [7]:

```
age2=105
predict2=lr.predict(age2)
predict2
```

Out[7]:

### Something went wrong

- The model that we built above is not right.
- There is certain issues with the type of dependent variable.
- The dependent variable is not continuous it is binary.
- We can’t fit a linear regression line to this data.

### Why not linear ?

- Consider Product sales data. The dataset has two columns.
- Age – continuous variable between 6-80
- Buy(0- Yes ; 1-No)

### Real-life examples

- Gaming – Win vs. Loss
- Sales – Buying vs. Not buying
- Marketing – Response vs. No Response
- Credit card & Loans – Default vs. Non Default
- Operations – Attrition vs. Retention
- Websites – Click vs. No click
- Fraud identification – Fraud vs. Non Fraud
- Healthcare – Cure vs. No Cure

The output of these non linear functions cannot be justifies with a linear model.