• LOGIN
  • No products in the cart.

204.1.8 Practice : Multiple Regression Issues

Practicing Multi Variable Linear Regression model.

Link to the previous post: https://statinfer.com/204-1-7-adjusted-r-squared-in-python/

In the last post of this session, we did cover basics of Multiple variable Linear Regression. In this post, we will Practice and try to solve issues associated with Multiple Regression.

Practice : Multiple Regression- issues

  • Import Final Exam Score data
  • Build a model to predict final score using the rest of the variables.
  • How are Sem2_Math & Final score related? As Sem2_Math score increases, what happens to Final score?
  • Remove “Sem1_Math” variable from the model and rebuild the model
  • Is there any change in R square or Adj R square
  • How are Sem2_Math & Final score related now? As Sem2_Math score increases, what happens to Final score?
  • Draw a scatter plot between Sem1_Math & Sem2_Math
  • Find the correlation between Sem1_Math & Sem2_Math
In [34]:
#Import Final Exam Score data
final_exam=pd.read_csv("datasets\\Final Exam\\Final Exam Score.csv")
In [35]:
#Size of the data
final_exam.shape
Out[35]:
(24, 5)
In [36]:
#Variable names
final_exam.columns
Out[36]:
Index(['Sem1_Science', 'Sem2_Science', 'Sem1_Math', 'Sem2_Math',
       'Final_exam_marks'],
      dtype='object')
In [37]:
#Build a model to predict final score using the rest of the variables.
from sklearn.linear_model import LinearRegression
lr1 = LinearRegression()
lr1.fit(final_exam[["Sem1_Science"]+["Sem2_Science"]+["Sem1_Math"]+["Sem2_Math"]], final_exam[["Final_exam_marks"]])
predictions1 = lr1.predict(final_exam[["Sem1_Science"]+["Sem2_Science"]+["Sem1_Math"]+["Sem2_Math"]])

import statsmodels.formula.api as sm
model1 = sm.ols(formula='Final_exam_marks ~ Sem1_Science+Sem2_Science+Sem1_Math+Sem2_Math', data=final_exam)
fitted1 = model1.fit()
fitted1.summary()
Out[37]:
OLS Regression Results
Dep. Variable: Final_exam_marks R-squared: 0.990
Model: OLS Adj. R-squared: 0.987
Method: Least Squares F-statistic: 452.3
Date: Wed, 27 Jul 2016 Prob (F-statistic): 1.50e-18
Time: 11:48:28 Log-Likelihood: -38.099
No. Observations: 24 AIC: 86.20
Df Residuals: 19 BIC: 92.09
Df Model: 4
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
Intercept -1.6226 1.999 -0.812 0.427 -5.806 2.561
Sem1_Science 0.1738 0.063 2.767 0.012 0.042 0.305
Sem2_Science 0.2785 0.052 5.379 0.000 0.170 0.387
Sem1_Math 0.7890 0.197 4.002 0.001 0.376 1.202
Sem2_Math -0.2063 0.191 -1.078 0.294 -0.607 0.194
Omnibus: 6.343 Durbin-Watson: 1.863
Prob(Omnibus): 0.042 Jarque-Bera (JB): 4.332
Skew: 0.973 Prob(JB): 0.115
Kurtosis: 3.737 Cond. No. 1.20e+03
In [38]:
fitted1.rsquared
Out[38]:
0.98960765475687229
  • How are Sem2_Math & Final score related? As Sem2_Math score increases, what happens to Final score?

As Sem2_Math score increases Final score decreases

In [39]:
#Remove "Sem1_Math" variable from the model and rebuild the model
from sklearn.linear_model import LinearRegression
lr2 = LinearRegression()
lr2.fit(final_exam[["Sem1_Science"]+["Sem2_Science"]+["Sem2_Math"]], final_exam[["Final_exam_marks"]])
predictions2 = lr2.predict(final_exam[["Sem1_Science"]+["Sem2_Science"]+["Sem2_Math"]])

import statsmodels.formula.api as sm
model2 = sm.ols(formula='Final_exam_marks ~ Sem1_Science+Sem2_Science+Sem2_Math', data=final_exam)
fitted2 = model2.fit()
fitted2.summary()
Out[39]:
OLS Regression Results
Dep. Variable: Final_exam_marks R-squared: 0.981
Model: OLS Adj. R-squared: 0.978
Method: Least Squares F-statistic: 341.4
Date: Wed, 27 Jul 2016 Prob (F-statistic): 2.44e-17
Time: 11:48:29 Log-Likelihood: -45.436
No. Observations: 24 AIC: 98.87
Df Residuals: 20 BIC: 103.6
Df Model: 3
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
Intercept -2.3986 2.632 -0.911 0.373 -7.889 3.092
Sem1_Science 0.2130 0.082 2.595 0.017 0.042 0.384
Sem2_Science 0.2686 0.068 3.925 0.001 0.126 0.411
Sem2_Math 0.5320 0.067 7.897 0.000 0.391 0.673
Omnibus: 5.869 Durbin-Watson: 2.424
Prob(Omnibus): 0.053 Jarque-Bera (JB): 3.793
Skew: 0.864 Prob(JB): 0.150
Kurtosis: 3.898 Cond. No. 1.03e+03
  • Is there any change in R square or Adj R square
Model R2
AdjR2
model1 0.990 0.987
model2 0.981 0.978
  • How are Sem2_Math & Final score related now? As Sem2_Math score increases, what happens to Final score?

As Sem2_Math score increases Final score also increases.

In [40]:
#Draw a scatter plot between Sem1_Math & Sem2_Mat

import matplotlib.pyplot as plt
%matplotlib inline 
plt.scatter(final_exam.Sem1_Math,final_exam.Sem2_Math)
Out[40]:
<matplotlib.collections.PathCollection at 0xb2cf0f0>
In [41]:
#Find the correlation between Sem1_Math & Sem2_Math 
np.corrcoef(final_exam.Sem1_Math,final_exam.Sem2_Math)
Out[41]:
array([[ 1.       ,  0.9924948],
       [ 0.9924948,  1.       ]])

The next post is about issues of multicollinearity in python.

Link to the next post : https://statinfer.com/204-1-9-issue-of-multicollinearity-in-python/

0 responses on "204.1.8 Practice : Multiple Regression Issues"

Leave a Message