Link to the previous post : https://statinfer.com/204-2-1-logistic-regression-why-do-we-need-it/

In the last post we saw linear regression cannot be used if the final output is binary, yes or no. As it’s tough to fit a binary output on a linear function.

To solve this problem we can move toward some different kind of functions, a Logistic Function being the first choice.

A Logistic Function

This is how a Logistic Function look like:

The Logistic function

We want a model that predicts probabilities between 0 and 1, that is, S-shaped.
There are lots of s-shaped curves. We use the logistic model: $P r o b a b i l i t y = e ( β 0 + β 1 X ) 1 + e ( $ β 0 + β 1 X )$

Logistic Regression Output

In logistic regression, we try to predict the probability instead of direct values.
Y is binary, it takes only two values 1 and 0 instead of predicting 1 or 0 we predict the probability of 1 and probability of zero.
This suits aptly for the binary categorical outputs like YES vs NO; WIN vs LOSS; Fraud vs Non Fraud.

Practice : Logistic Regression

Dataset: Product Sales Data/Product_sales.csv
Build a logistic Regression line between Age and buying
A 4 years old customer, will he buy the product?
If Age is 105 then will that customer buy the product?

In [8]:

import pandas as pd 
sales=pd.read_csv("datasets\\Product Sales Data\\Product_sales.csv")

import statsmodels.formula.api as sm

# Build a logistic Regression line between Age and buying 
logit=sm.Logit(sales['Bought'],sales['Age'])
logit

Out[8]:

<statsmodels.discrete.discrete_model.Logit at 0x203ba4ac630>

In [9]:

result = logit.fit()
result

Optimization terminated successfully.
         Current function value: 0.584320
         Iterations 5

Out[9]:

<statsmodels.discrete.discrete_model.BinaryResultsWrapper at 0x203bbd90e48>

In [10]:

result.summary()

Out[10]:

Logit Regression Results
Dep. Variable:	Bought	No. Observations:	467
Model:	Logit	Df Residuals:	466
Method:	MLE	Df Model:	0
Date:	Sun, 16 Oct 2016	Pseudo R-squ.:	0.1478
Time:	14:35:42	Log-Likelihood:	-272.88
converged:	True	LL-Null:	-320.21
		LLR p-value:	nan

	coef	std err	z	P>\|z\|	[95.0% Conf. Int.]
Age	0.0294	0.003	8.813	0.000	0.023 0.036

In [11]:

###coefficients Interval of each coefficient

print (result.conf_int())

            0         1
Age  0.022851  0.035923

In [12]:

#One more way of fitting the model
from sklearn.linear_model import LogisticRegression
logistic = LogisticRegression()
logistic.fit(sales[["Age"]],sales["Bought"])

Out[12]:

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

In [13]:

#A 4 years old customer, will he buy the product?
age1=4
predict_age1=logistic.predict(age1)
print(predict_age1)

[0]

In [14]:

#If Age is 105 then will that customer buy the product?
age2=105
predict_age2=logistic.predict(age2)
print(predict_age2)

[1]


The next post is on multiple logistic regression.
Link to the next post: https://statinfer.com/204-2-3-multiple-logistic-regression/

21st June 2017

204.2.2 Logistic Function to Regression

From function to regression.

A Logistic Function

The Logistic function

Logistic Regression Output

Practice : Logistic Regression

Statinfer

Statinfer

Statinfer

204.2.2 Logistic Function to Regression

From function to regression.

A Logistic Function

The Logistic function

Logistic Regression Output

Practice : Logistic Regression

Related Courses

Python(Batch6)

Statinfer

Tableau (Batch6)

Statinfer

PowerBI (Batch6)

Statinfer