• No products in the cart.

203.2.5 Multi-collinearity and Individual Impact Of Variables in Logistic Regression

Multicollinearity

In previous section, we studied about Goodness of fit for Logistic Regression

  • When the relation between X and Y is non linear, we use logistic regression
  • The multicollinearity is an issue related to predictor variables. = Multicollinearity need to be fixed in logistic regression as well.
  • Otherwise the individual coefficients of the predictors will be effected by the interdependency
  • The process of identification is same as linear regression

Multicollinearity in R

library(car)
## Warning: package 'car' was built under R version 3.1.3
vif(Fiberbits_model_1)
##                     income          months_on_network 
##                   4.590705                   4.641040 
##             Num_complaints        number_plan_changes 
##                   1.018607                   1.126892 
##                  relocated               monthly_bill 
##                   1.145847                   1.017565 
## technical_issues_per_month          Speed_test_result 
##                   1.020648                   1.206999

Individual Impact of Variables

  • Out of these predictor variables, what are the important variables?
  • If we have to choose the top 5 variables what are they?
  • While selecting the model, we may want to drop few less impacting variables.
  • How to rank the predictor variables in the order of their importance?
  • We can simply look at the z values of the each variable. Look at their absolute values
  • Or calculate the Wald chi-square, which is nearly equal to square of the z-score
  • Wald Chi-Square value helps in ranking the variables

Code-Individual Impact of Variables

library(caret)
## Warning: package 'caret' was built under R version 3.1.3
## Loading required package: lattice
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.1.3
varImp(Fiberbits_model_1, scale = FALSE)
##                             Overall
## income                     20.81981
## months_on_network          28.65421
## Num_complaints             22.81102
## number_plan_changes        24.93955
## relocated                  79.92677
## monthly_bill               13.99490
## technical_issues_per_month 54.58123
## Speed_test_result          93.43471

This will give the absolute value of the Z-score

Model Selection – AIC and BIC

  • AIC and BIC values are like adjusted R-squared values in linear regression
  • Stand-alone model AIC has no real use, but if we are choosing between the models AIC really helps.
  • Given a collection of models for the data, AIC estimates the quality of each model, relative to each of the other models
  • If we are choosing between two models, a model with less AIC is preferred
  • AIC is an estimate of the information lost when a given model is used to represent the process that generates the data
  • AIC= -2ln(L)+ 2k
  • L be the maximum value of the likelihood function for the model
  • k is the number of independent variables
  • BIC is a substitute to AIC with a slightly different formula. We will follow either AIC or BIC throughout our analysis

Code-AIC and BIC

library(stats)
AIC(Fiberbits_model_1)
## [1] 98377.36
BIC(Fiberbits_model_1)
## [1] 98462.97

The next post is about Model Selection in logistic regression.

Statinfer

Statinfer derived from Statistical inference. We provide training in various Data Analytics and Data Science courses and assist candidates in securing placements.

Contact Us

info@statinfer.com

+91- 9676098897

+91- 9494762485

 

Our Social Links

top
© 2020. All Rights Reserved.