• No products in the cart.

204.1.11 Interaction Terms

Trick to improve model accuracy.

Link to the previous post : https://statinfer.com/204-1-10-practice-multiple-regression-with-multicollinearity/

 

This is the final post in our Linear Regression Series.

This post is about a trick called Interaction Terms, which may improve the accuracy of the model.

Interaction Terms

  • Interaction terms are when we use a derived variable from one or more per-existing variables, it can be multiple or division of these variables.
  • Adding interaction terms might help in improving the prediction accuracy of the model.
  • The addition of interaction terms needs prior knowledge of the dataset and variables.

Practice : Interaction Terms

  • Add few interaction terms to previous web product sales model and see the increase in the accuracy.
In [70]:
import statsmodels.formula.api as sm
model4 = sm.ols(formula='Sales ~ Server_Down_time_Sec+Holiday+Special_Discount+Online_Ad_Paid_ref_links+Social_Network_Ref_links+Month+Weekday+DayofMonth+Holiday*Weekday', data=Webpage_Product_Sales)
fitted4 = model4.fit()
fitted4.summary()
Out[70]:
OLS Regression Results
Dep. Variable: Sales R-squared: 0.865
Model: OLS Adj. R-squared: 0.863
Method: Least Squares F-statistic: 473.6
Date: Wed, 27 Jul 2016 Prob (F-statistic): 2.17e-282
Time: 12:59:08 Log-Likelihood: -6355.7
No. Observations: 675 AIC: 1.273e+04
Df Residuals: 665 BIC: 1.278e+04
Df Model: 9
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
Intercept 6753.6923 708.791 9.528 0.000 5361.955 8145.430
Server_Down_time_Sec -140.4922 12.044 -11.665 0.000 -164.141 -116.844
Holiday 2201.8694 1232.336 1.787 0.074 -217.870 4621.608
Special_Discount 4749.0044 344.145 13.799 0.000 4073.262 5424.747
Online_Ad_Paid_ref_links 5.9515 0.250 23.805 0.000 5.461 6.442
Social_Network_Ref_links 7.0657 0.353 19.994 0.000 6.372 7.760
Month 480.3156 35.597 13.493 0.000 410.420 550.212
Weekday 1164.8864 59.143 19.696 0.000 1048.756 1281.017
DayofMonth 47.0967 13.073 3.603 0.000 21.428 72.766
Holiday:Weekday 4294.6865 281.683 15.247 0.000 3741.592 4847.782
Omnibus: 7.552 Durbin-Watson: 0.867
Prob(Omnibus): 0.023 Jarque-Bera (JB): 7.305
Skew: 0.219 Prob(JB): 0.0259
Kurtosis: 2.740 Cond. No. 2.32e+04

Conclusion – Regression

  • Try adding the polynomial & interaction terms to your regression line. Sometimes they work like a charm.
  • Adjusted R-squared is a good measure of training/in time sample error. We can’t be sure about the final model performance based on this. We may have to perform cross-validation to get an idea on testing error.
  • Outliers can influence the regression line, we need to take care of data sensitization before building the regression line.

Statinfer

Statinfer derived from Statistical inference. We provide training in various Data Analytics and Data Science courses and assist candidates in securing placements.

Contact Us

info@statinfer.com

+91- 9676098897

+91- 9494762485

 

Our Social Links

top
© 2020. All Rights Reserved.