• No products in the cart.

# 204.2.5 Multicollinearity and Individual Impact Of Variables in Logistic Regression

##### Solving the issue of multicollinearity and finding impact of individual variable.

Link to the previous post : https://statinfer.com/204-2-4-goodness-of-fit-for-logistic-regression/

Previous post was about goodness of fit, we covered Confusion matrix and will cover the rest in next posts too.

But, first let’s deal with a common issue with modeling:

## Multicollinearity

• The relation between X and Y is non linear, we used logistic regression.
• The multicollinearity is an issue related to predictor variables.
• Multicollinearity need to be fixed in logistic regression as well.
• Otherwise the individual coefficients of the predictors will be effected by the inter-dependency.
• The process of identification is same as linear regression.

### Practice : Multicollinearity

• Is there any multicollinearity in fiber bits model?
• Identify and remove multicollinearity from the model
In [27]:
def vif_cal(input_data, dependent_col):
x_vars=input_data.drop([dependent_col], axis=1)
xvar_names=x_vars.columns
for i in range(0,xvar_names.shape[0]):
y=x_vars[xvar_names[i]]
x=x_vars[xvar_names.drop(xvar_names[i])]
rsq=sm.ols(formula="y~x", data=x_vars).fit().rsquared
vif=round(1/(1-rsq),2)
print (xvar_names[i], " VIF = " , vif)

In [28]:
#Calculating VIF values using that function
vif_cal(input_data=Fiber, dependent_col="active_cust")

income  VIF =  1.02
months_on_network  VIF =  1.03
Num_complaints  VIF =  1.01
number_plan_changes  VIF =  1.59
relocated  VIF =  1.56
monthly_bill  VIF =  1.02
technical_issues_per_month  VIF =  1.06
Speed_test_result  VIF =  1.0


## Individual Impact of Variables

• Out of these predictor variables, what are the important variables?
• If we have to choose the top 5 variables what are they?
• While selecting the model, we may want to drop few less impacting variables.
• How to rank the predictor variables in the order of their importance?
• We can simply look at the z values of the each variable. Look at their absolute values.
• Or calculate the Wald chi-square, which is nearly equal to square of the z-score.
• Wald Chi-Square value helps in ranking the variables.

### Practice : Individual Impact of Variables

• Identify top impacting and least impacting variables in fiber bits models.
• Find the variable importance and order them based on their impact.
In [29]:
result1.summary()

Out[29]:
Dep. Variable: No. Observations: active_cust 100000 Logit 99992 MLE 7 Sun, 16 Oct 2016 0.2403 14:35:51 -51717 True -68074 0
coef std err z P>|z| [95.0% Conf. Int.] 1.71e-05 4.17e-06 4.097 0.000 8.92e-06 2.53e-05 0.0150 0.000 31.172 0.000 0.014 0.016 -1.7669 0.027 -65.284 0.000 -1.820 -1.714 -0.1784 0.007 -23.909 0.000 -0.193 -0.164 -3.0826 0.040 -76.259 0.000 -3.162 -3.003 -0.0024 0.000 -16.014 0.000 -0.003 -0.002 -0.4636 0.007 -64.010 0.000 -0.478 -0.449 0.1094 0.001 75.073 0.000 0.107 0.112

Top impacting variables are – relocated & Speed_test_result.

Least impacting variables are – monthly_bill & income.

The next post is about model selection logistic regression.

Link to the next post : https://statinfer.com/204-2-6-model-selection-logistic-regression/

21st June 2017

### 0 responses on "204.2.5 Multicollinearity and Individual Impact Of Variables in Logistic Regression"

Statinfer Software Solutions LLP

Software Technology Parks of India,
NH16, Krishna Nagar, Benz Circle,