Statinfer

203.2.5 Multi-collinearity and Individual Impact Of Variables in Logistic Regression

Multicollinearity

In previous section, we studied about Goodness of fit for Logistic Regression

  • When the relation between X and Y is non linear, we use logistic regression
  • The multicollinearity is an issue related to predictor variables. = Multicollinearity need to be fixed in logistic regression as well.
  • Otherwise the individual coefficients of the predictors will be effected by the interdependency
  • The process of identification is same as linear regression

Multicollinearity in R

library(car)
## Warning: package 'car' was built under R version 3.1.3
vif(Fiberbits_model_1)
##                     income          months_on_network 
##                   4.590705                   4.641040 
##             Num_complaints        number_plan_changes 
##                   1.018607                   1.126892 
##                  relocated               monthly_bill 
##                   1.145847                   1.017565 
## technical_issues_per_month          Speed_test_result 
##                   1.020648                   1.206999

Individual Impact of Variables

  • Out of these predictor variables, what are the important variables?
  • If we have to choose the top 5 variables what are they?
  • While selecting the model, we may want to drop few less impacting variables.
  • How to rank the predictor variables in the order of their importance?
  • We can simply look at the z values of the each variable. Look at their absolute values
  • Or calculate the Wald chi-square, which is nearly equal to square of the z-score
  • Wald Chi-Square value helps in ranking the variables

Code-Individual Impact of Variables

library(caret)
## Warning: package 'caret' was built under R version 3.1.3
## Loading required package: lattice
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.1.3
varImp(Fiberbits_model_1, scale = FALSE)
##                             Overall
## income                     20.81981
## months_on_network          28.65421
## Num_complaints             22.81102
## number_plan_changes        24.93955
## relocated                  79.92677
## monthly_bill               13.99490
## technical_issues_per_month 54.58123
## Speed_test_result          93.43471

This will give the absolute value of the Z-score

Model Selection – AIC and BIC

  • AIC and BIC values are like adjusted R-squared values in linear regression
  • Stand-alone model AIC has no real use, but if we are choosing between the models AIC really helps.
  • Given a collection of models for the data, AIC estimates the quality of each model, relative to each of the other models
  • If we are choosing between two models, a model with less AIC is preferred
  • AIC is an estimate of the information lost when a given model is used to represent the process that generates the data
  • AIC= -2ln(L)+ 2k
  • L be the maximum value of the likelihood function for the model
  • k is the number of independent variables
  • BIC is a substitute to AIC with a slightly different formula. We will follow either AIC or BIC throughout our analysis

Code-AIC and BIC

library(stats)
AIC(Fiberbits_model_1)
## [1] 98377.36
BIC(Fiberbits_model_1)
## [1] 98462.97

The next post is about Model Selection in logistic regression.

0 responses on "203.2.5 Multi-collinearity and Individual Impact Of Variables in Logistic Regression"

Leave a Message

Blog Posts

Hurry up!!!

"use coupon code for FLAT 30% discount"  datascientistoffer        ___________________________________      Subscribe to our youtube channel. Get access to video tutorials.                

Contact Us

Statinfer Software Solutions#647 2nd floor 1st Main, Indira Nagar 1st Stage, 100 feet road,Indranagar Bangalore,Karnataka, Pin code:-560038 Landmarks: Opp. Namma Metro Pillar 48.

Connect with us

linkin fn twitter g

How to become a Data Scientist.?

top