• No products in the cart.

# 203.2.5 Multi-collinearity and Individual Impact Of Variables in Logistic Regression

## Multicollinearity

In previous section, we studied about Goodness of fit for Logistic Regression

• When the relation between X and Y is non linear, we use logistic regression
• The multicollinearity is an issue related to predictor variables. = Multicollinearity need to be fixed in logistic regression as well.
• Otherwise the individual coefficients of the predictors will be effected by the interdependency
• The process of identification is same as linear regression

### Multicollinearity in R

library(car)
## Warning: package 'car' was built under R version 3.1.3
vif(Fiberbits_model_1)
##                     income          months_on_network
##                   4.590705                   4.641040
##             Num_complaints        number_plan_changes
##                   1.018607                   1.126892
##                  relocated               monthly_bill
##                   1.145847                   1.017565
## technical_issues_per_month          Speed_test_result
##                   1.020648                   1.206999

### Individual Impact of Variables

• Out of these predictor variables, what are the important variables?
• If we have to choose the top 5 variables what are they?
• While selecting the model, we may want to drop few less impacting variables.
• How to rank the predictor variables in the order of their importance?
• We can simply look at the z values of the each variable. Look at their absolute values
• Or calculate the Wald chi-square, which is nearly equal to square of the z-score
• Wald Chi-Square value helps in ranking the variables

### Code-Individual Impact of Variables

library(caret)
## Warning: package 'caret' was built under R version 3.1.3
## Loading required package: lattice
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.1.3
varImp(Fiberbits_model_1, scale = FALSE)
##                             Overall
## income                     20.81981
## months_on_network          28.65421
## Num_complaints             22.81102
## number_plan_changes        24.93955
## relocated                  79.92677
## monthly_bill               13.99490
## technical_issues_per_month 54.58123
## Speed_test_result          93.43471

This will give the absolute value of the Z-score

### Model Selection – AIC and BIC

• AIC and BIC values are like adjusted R-squared values in linear regression
• Stand-alone model AIC has no real use, but if we are choosing between the models AIC really helps.
• Given a collection of models for the data, AIC estimates the quality of each model, relative to each of the other models
• If we are choosing between two models, a model with less AIC is preferred
• AIC is an estimate of the information lost when a given model is used to represent the process that generates the data
• AIC= -2ln(L)+ 2k
• L be the maximum value of the likelihood function for the model
• k is the number of independent variables
• BIC is a substitute to AIC with a slightly different formula. We will follow either AIC or BIC throughout our analysis

### Code-AIC and BIC

library(stats)
AIC(Fiberbits_model_1)
##  98377.36
BIC(Fiberbits_model_1)
##  98462.97

The next post is about Model Selection in logistic regression.

### 0 responses on "203.2.5 Multi-collinearity and Individual Impact Of Variables in Logistic Regression"

Statinfer Software Solutions LLP

Software Technology Parks of India,
NH16, Krishna Nagar, Benz Circle,