204.1.11 Interaction Terms
Link to the previous post : https://statinfer.com/204-1-10-practice-multiple-regression-with-multicollinearity/
This is the final post in our Linear Regression Series.
This post is about a trick called Interaction Terms, which may improve the accuracy of the model.
Interaction Terms
- Interaction terms are when we use a derived variable from one or more per-existing variables, it can be multiple or division of these variables.
- Adding interaction terms might help in improving the prediction accuracy of the model.
- The addition of interaction terms needs prior knowledge of the dataset and variables.
Practice : Interaction Terms
- Add few interaction terms to previous web product sales model and see the increase in the accuracy.
Out[70]:
OLS Regression Results
Dep. Variable: |
Sales |
R-squared: |
0.865 |
Model: |
OLS |
Adj. R-squared: |
0.863 |
Method: |
Least Squares |
F-statistic: |
473.6 |
Date: |
Wed, 27 Jul 2016 |
Prob (F-statistic): |
2.17e-282 |
Time: |
12:59:08 |
Log-Likelihood: |
-6355.7 |
No. Observations: |
675 |
AIC: |
1.273e+04 |
Df Residuals: |
665 |
BIC: |
1.278e+04 |
Df Model: |
9 |
|
|
Covariance Type: |
nonrobust |
|
|
|
coef |
std err |
t |
P>|t| |
[95.0% Conf. Int.] |
Intercept |
6753.6923 |
708.791 |
9.528 |
0.000 |
5361.955 8145.430 |
Server_Down_time_Sec |
-140.4922 |
12.044 |
-11.665 |
0.000 |
-164.141 -116.844 |
Holiday |
2201.8694 |
1232.336 |
1.787 |
0.074 |
-217.870 4621.608 |
Special_Discount |
4749.0044 |
344.145 |
13.799 |
0.000 |
4073.262 5424.747 |
Online_Ad_Paid_ref_links |
5.9515 |
0.250 |
23.805 |
0.000 |
5.461 6.442 |
Social_Network_Ref_links |
7.0657 |
0.353 |
19.994 |
0.000 |
6.372 7.760 |
Month |
480.3156 |
35.597 |
13.493 |
0.000 |
410.420 550.212 |
Weekday |
1164.8864 |
59.143 |
19.696 |
0.000 |
1048.756 1281.017 |
DayofMonth |
47.0967 |
13.073 |
3.603 |
0.000 |
21.428 72.766 |
Holiday:Weekday |
4294.6865 |
281.683 |
15.247 |
0.000 |
3741.592 4847.782 |
Omnibus: |
7.552 |
Durbin-Watson: |
0.867 |
Prob(Omnibus): |
0.023 |
Jarque-Bera (JB): |
7.305 |
Skew: |
0.219 |
Prob(JB): |
0.0259 |
Kurtosis: |
2.740 |
Cond. No. |
2.32e+04 |
Conclusion – Regression
- Try adding the polynomial & interaction terms to your regression line. Sometimes they work like a charm.
- Adjusted R-squared is a good measure of training/in time sample error. We can’t be sure about the final model performance based on this. We may have to perform cross-validation to get an idea on testing error.
- Outliers can influence the regression line, we need to take care of data sensitization before building the regression line.