Link to the previous post : https://statinfer.com/204-2-4-goodness-of-fit-for-logistic-regression/
Previous post was about goodness of fit, we covered Confusion matrix and will cover the rest in next posts too.
But, first let’s deal with a common issue with modeling:
- The relation between X and Y is non linear, we used logistic regression.
- The multicollinearity is an issue related to predictor variables.
- Multicollinearity need to be fixed in logistic regression as well.
- Otherwise the individual coefficients of the predictors will be effected by the inter-dependency.
- The process of identification is same as linear regression.
Practice : Multicollinearity
- Is there any multicollinearity in fiber bits model?
- Identify and remove multicollinearity from the model
def vif_cal(input_data, dependent_col): x_vars=input_data.drop([dependent_col], axis=1) xvar_names=x_vars.columns for i in range(0,xvar_names.shape): y=x_vars[xvar_names[i]] x=x_vars[xvar_names.drop(xvar_names[i])] rsq=sm.ols(formula="y~x", data=x_vars).fit().rsquared vif=round(1/(1-rsq),2) print (xvar_names[i], " VIF = " , vif)
#Calculating VIF values using that function vif_cal(input_data=Fiber, dependent_col="active_cust")
income VIF = 1.02 months_on_network VIF = 1.03 Num_complaints VIF = 1.01 number_plan_changes VIF = 1.59 relocated VIF = 1.56 monthly_bill VIF = 1.02 technical_issues_per_month VIF = 1.06 Speed_test_result VIF = 1.0
Individual Impact of Variables
- Out of these predictor variables, what are the important variables?
- If we have to choose the top 5 variables what are they?
- While selecting the model, we may want to drop few less impacting variables.
- How to rank the predictor variables in the order of their importance?
- We can simply look at the z values of the each variable. Look at their absolute values.
- Or calculate the Wald chi-square, which is nearly equal to square of the z-score.
- Wald Chi-Square value helps in ranking the variables.
Practice : Individual Impact of Variables
- Identify top impacting and least impacting variables in fiber bits models.
- Find the variable importance and order them based on their impact.
|Dep. Variable:||active_cust||No. Observations:||100000|
|Date:||Sun, 16 Oct 2016||Pseudo R-squ.:||0.2403|
|coef||std err||z||P>|z|||[95.0% Conf. Int.]|
Top impacting variables are – relocated & Speed_test_result.
Least impacting variables are – monthly_bill & income.
The next post is about model selection logistic regression.
Link to the next post : https://statinfer.com/204-2-6-model-selection-logistic-regression/