Link to the previous post: https://statinfer.com/204-1-5-r-squared-in-python/

For the last few posts of the machine learning blog series 204, we were just going through single input variable regression. In this post, we will see how to take care of multiple input variables.

Multiple Regression

Using multiple predictor variables instead of single variable
We need to find a perfect plane here

Code – Multiple Regression

In [21]:

from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(air[["Promotion_Budget"]+["Inter_metro_flight_ratio"]], air[["Passengers"]])

Out[21]:

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [22]:

predictions = lr.predict(air[["Promotion_Budget"]+["Inter_metro_flight_ratio"]])

In [23]:

import statsmodels.formula.api as sm
model = sm.ols(formula='Passengers ~ Promotion_Budget+Inter_metro_flight_ratio', data=air)
fitted = model.fit()
fitted.summary()

Out[23]:

OLS Regression Results
Dep. Variable:	Passengers	R-squared:	0.934
Model:	OLS	Adj. R-squared:	0.932
Method:	Least Squares	F-statistic:	540.5
Date:	Wed, 27 Jul 2016	Prob (F-statistic):	4.76e-46
Time:	11:48:27	Log-Likelihood:	-750.96
No. Observations:	80	AIC:	1508.
Df Residuals:	77	BIC:	1515.
Df Model:	2
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[95.0% Conf. Int.]
Intercept	2017.7724	1624.803	1.242	0.218	-1217.624 5253.169
Promotion_Budget	0.0707	0.002	28.297	0.000	0.066 0.076
Inter_metro_flight_ratio	-2121.5208	2473.189	-0.858	0.394	-7046.268 2803.227

Omnibus:	26.259	Durbin-Watson:	1.800
Prob(Omnibus):	0.000	Jarque-Bera (JB):	5.075
Skew:	-0.096	Prob(JB):	0.0791
Kurtosis:	1.781	Cond. No.	5.25e+06

Individual Impact of variables

Look at the P-value
Probability of the hypothesis being right.
Individual variable coefficient is tested for significance
Beta coefficients follow t distribution.
Individual P values tell us about the significance of each variable
A variable is significant if P value is less than 5%. Lesser the P-value, better the variable
Note it is possible all the variables in a regression to produce a great fit, and yet very few of the variables be individually significant.

To test

H 0 : β i = 0

H a : β i \neq 0

Test Statistic

t = b i/ s ( b i )

Reject $H_{0}$ if

t > t (α 2; n - k - 1)

t > - t (α 2; n - k - 1)

Practice : Multiple Regression

Build a multiple regression model to predict the number of passengers
What is R-square value
Are there any predictor variables that are not impacting the dependent variable

In [24]:

#Build a multiple regression model to predict the number of passengers

from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(air[["Promotion_Budget"]+["Inter_metro_flight_ratio"]+["Service_Quality_Score"]], air[["Passengers"]])
predictions = lr.predict(air[["Promotion_Budget"]+["Inter_metro_flight_ratio"]+["Service_Quality_Score"]])

In [25]:

import statsmodels.formula.api as sm
model = sm.ols(formula='Passengers ~ Promotion_Budget+Inter_metro_flight_ratio+Service_Quality_Score', data=air)
fitted = model.fit()
fitted.summary()

Out[25]:

OLS Regression Results
Dep. Variable:	Passengers	R-squared:	0.951
Model:	OLS	Adj. R-squared:	0.949
Method:	Least Squares	F-statistic:	495.6
Date:	Wed, 27 Jul 2016	Prob (F-statistic):	8.71e-50
Time:	11:48:28	Log-Likelihood:	-738.45
No. Observations:	80	AIC:	1485.
Df Residuals:	76	BIC:	1494.
Df Model:	3
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[95.0% Conf. Int.]
Intercept	1.921e+04	3542.694	5.424	0.000	1.22e+04 2.63e+04
Promotion_Budget	0.0555	0.004	15.476	0.000	0.048 0.063
Inter_metro_flight_ratio	-2003.4508	2129.095	-0.941	0.350	-6243.912 2237.010
Service_Quality_Score	-2802.0708	530.382	-5.283	0.000	-3858.419 -1745.723

Omnibus:	6.902	Durbin-Watson:	2.312
Prob(Omnibus):	0.032	Jarque-Bera (JB):	2.759
Skew:	-0.051	Prob(JB):	0.252
Kurtosis:	2.096	Cond. No.	8.22e+06

What is R-square value

0.951

Are there any predictor variables that are not impacting the dependent variable

Inter_metro_flight_ratio

The next post is about adjusted R squared in python.

Link to the next post: https://statinfer.com/204-1-7-adjusted-r-squared-in-python/

23rd January 2018

204.1.6 Multiple Regression in Python

Dealing with more than one input variable in Linear Regression.

Multiple Regression

Code – Multiple Regression

Individual Impact of variables

Practice : Multiple Regression

Statinfer

Statinfer

Statinfer

204.1.6 Multiple Regression in Python

Dealing with more than one input variable in Linear Regression.

Multiple Regression

Code – Multiple Regression

Individual Impact of variables

Practice : Multiple Regression

Related Courses

Python(Batch6)

Statinfer

Tableau (Batch6)

Statinfer

PowerBI (Batch6)

Statinfer