Link to the previous post : https://statinfer.com/204-2-2-logistic-function-to-regression/
In the last post 204.2.2 we created a Logistic Regression model with single input variable. In this post we will use multiple input variables.
Multiple Logistic Regression
- The dependent variable is binary
- Instead of single independent/predictor variable, we have multiple predictors
- Like buying / non-buying depends on customer attributes like age, gender, place, income etc.,
Practice : Multiple Logistic Regression
- Dataset: Fiberbits/Fiberbits.csv
- Active_cust variable indicates whether the customer is active or already left the network.
- Build a model to predict the chance of attrition for a given customer using all the features.
- How good is your model?
- What are the most impacting variables?
In [15]:
#Dataset: Fiberbits/Fiberbits.csv
Fiber=pd.read_csv("datasets\\Fiberbits\\Fiberbits.csv")
list(Fiber.columns.values)
Out[15]:
In [16]:
#Build a model to predict the chance of attrition for a given customer using all the features.
from sklearn.linear_model import LogisticRegression
logistic= LogisticRegression()
###fitting logistic regression for active customer on rest of the varibles#######
logistic.fit(Fiber[["income"]+['months_on_network']+['Num_complaints']+['number_plan_changes']+['relocated']+['monthly_bill']+['technical_issues_per_month']+['Speed_test_result']],Fiber[['active_cust']])
Out[16]:
In [17]:
predict1=logistic.predict(Fiber[["income"]+['months_on_network']+['Num_complaints']+['number_plan_changes']+['relocated']+['monthly_bill']+['technical_issues_per_month']+['Speed_test_result']])
predict1
Out[17]:
In [18]:
#How good is your model?
### calculate confusion matrix
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
from sklearn.cross_validation import train_test_split
from sklearn.metrics import confusion_matrix ###for using confusion matrix###
cm1 = confusion_matrix(Fiber[['active_cust']],predict1)
print(cm1)
In [19]:
total1=sum(sum(cm1))
total1
Out[19]:
In [20]:
#####from confusion matrix calculate accuracy
accuracy1=(cm1[0,0]+cm1[1,1])/total1
accuracy1
Out[20]:
In [21]:
#What are the most impacting variables?
#### From summary of the model
logit1=sm.Logit(Fiber['active_cust'],Fiber[['income']+['months_on_network']+['Num_complaints']+['number_plan_changes']+['relocated']+['monthly_bill']+['technical_issues_per_month']+['Speed_test_result']])
logit1
Out[21]:
In [22]:
result1=logit1.fit()
In [23]:
result1.summary()
Out[23]:
In [24]:
result1.summary2()
Out[24]:
For all the variables p < 0.05, so all are impacting
The next post is about goodness of fit for logistic regression.
Link to the next post : https://statinfer.com/204-2-4-goodness-of-fit-for-logistic-regression/