• No products in the cart.

204.1.6 Multiple Regression in Python

Dealing with more than one input variable in Linear Regression.

Link to the previous post: https://statinfer.com/204-1-5-r-squared-in-python/

 

For the last few posts of the machine learning blog series 204, we were just going through single input variable regression. In this post, we will see how to take care of multiple input variables.

Multiple Regression

  • Using multiple predictor variables instead of single variable
  • We need to find a perfect plane here

Code – Multiple Regression

In [21]:
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(air[["Promotion_Budget"]+["Inter_metro_flight_ratio"]], air[["Passengers"]])
Out[21]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
In [22]:
predictions = lr.predict(air[["Promotion_Budget"]+["Inter_metro_flight_ratio"]])
In [23]:
import statsmodels.formula.api as sm
model = sm.ols(formula='Passengers ~ Promotion_Budget+Inter_metro_flight_ratio', data=air)
fitted = model.fit()
fitted.summary()
Out[23]:
OLS Regression Results
Dep. Variable: Passengers R-squared: 0.934
Model: OLS Adj. R-squared: 0.932
Method: Least Squares F-statistic: 540.5
Date: Wed, 27 Jul 2016 Prob (F-statistic): 4.76e-46
Time: 11:48:27 Log-Likelihood: -750.96
No. Observations: 80 AIC: 1508.
Df Residuals: 77 BIC: 1515.
Df Model: 2
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
Intercept 2017.7724 1624.803 1.242 0.218 -1217.624 5253.169
Promotion_Budget 0.0707 0.002 28.297 0.000 0.066 0.076
Inter_metro_flight_ratio -2121.5208 2473.189 -0.858 0.394 -7046.268 2803.227
Omnibus: 26.259 Durbin-Watson: 1.800
Prob(Omnibus): 0.000 Jarque-Bera (JB): 5.075
Skew: -0.096 Prob(JB): 0.0791
Kurtosis: 1.781 Cond. No. 5.25e+06

Individual Impact of variables

  • Look at the P-value
  • Probability of the hypothesis being right.
  • Individual variable coefficient is tested for significance
  • Beta coefficients follow t distribution.
  • Individual P values tell us about the significance of each variable
  • A variable is significant if P value is less than 5%. Lesser the P-value, better the variable
  • Note it is possible all the variables in a regression to produce a great fit, and yet very few of the variables be individually significant.

To test

H0:βi=0
Ha:βi0

Test Statistic

t=bi/s(bi)

Reject H0 if

t>t(α2;nk1)

or

t>t(α2;nk1)

Practice : Multiple Regression

  • Build a multiple regression model to predict the number of passengers
  • What is R-square value
  • Are there any predictor variables that are not impacting the dependent variable
In [24]:
#Build a multiple regression model to predict the number of passengers

from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(air[["Promotion_Budget"]+["Inter_metro_flight_ratio"]+["Service_Quality_Score"]], air[["Passengers"]])
predictions = lr.predict(air[["Promotion_Budget"]+["Inter_metro_flight_ratio"]+["Service_Quality_Score"]])
In [25]:
import statsmodels.formula.api as sm
model = sm.ols(formula='Passengers ~ Promotion_Budget+Inter_metro_flight_ratio+Service_Quality_Score', data=air)
fitted = model.fit()
fitted.summary()
Out[25]:
OLS Regression Results
Dep. Variable: Passengers R-squared: 0.951
Model: OLS Adj. R-squared: 0.949
Method: Least Squares F-statistic: 495.6
Date: Wed, 27 Jul 2016 Prob (F-statistic): 8.71e-50
Time: 11:48:28 Log-Likelihood: -738.45
No. Observations: 80 AIC: 1485.
Df Residuals: 76 BIC: 1494.
Df Model: 3
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
Intercept 1.921e+04 3542.694 5.424 0.000 1.22e+04 2.63e+04
Promotion_Budget 0.0555 0.004 15.476 0.000 0.048 0.063
Inter_metro_flight_ratio -2003.4508 2129.095 -0.941 0.350 -6243.912 2237.010
Service_Quality_Score -2802.0708 530.382 -5.283 0.000 -3858.419 -1745.723
Omnibus: 6.902 Durbin-Watson: 2.312
Prob(Omnibus): 0.032 Jarque-Bera (JB): 2.759
Skew: -0.051 Prob(JB): 0.252
Kurtosis: 2.096 Cond. No. 8.22e+06
  • What is R-square value

0.951

  • Are there any predictor variables that are not impacting the dependent variable

Inter_metro_flight_ratio

The next post is about adjusted R squared in python.

Link to the next post: https://statinfer.com/204-1-7-adjusted-r-squared-in-python/

Statinfer

Statinfer derived from Statistical inference. We provide training in various Data Analytics and Data Science courses and assist candidates in securing placements.

Contact Us

info@statinfer.com

+91- 9676098897

+91- 9494762485

 

Our Social Links

top
© 2020. All Rights Reserved.