Link to the previous post : https://statinfer.com/204-1-10-practice-multiple-regression-with-multicollinearity/
This is the final post in our Linear Regression Series.
This post is about a trick called Interaction Terms, which may improve the accuracy of the model.
Interaction Terms
- Interaction terms are when we use a derived variable from one or more per-existing variables, it can be multiple or division of these variables.
- Adding interaction terms might help in improving the prediction accuracy of the model.
- The addition of interaction terms needs prior knowledge of the dataset and variables.
Practice : Interaction Terms
- Add few interaction terms to previous web product sales model and see the increase in the accuracy.
In [70]:
import statsmodels.formula.api as sm
model4 = sm.ols(formula='Sales ~ Server_Down_time_Sec+Holiday+Special_Discount+Online_Ad_Paid_ref_links+Social_Network_Ref_links+Month+Weekday+DayofMonth+Holiday*Weekday', data=Webpage_Product_Sales)
fitted4 = model4.fit()
fitted4.summary()
Out[70]:
Conclusion – Regression
- Try adding the polynomial & interaction terms to your regression line. Sometimes they work like a charm.
- Adjusted R-squared is a good measure of training/in time sample error. We can’t be sure about the final model performance based on this. We may have to perform cross-validation to get an idea on testing error.
- Outliers can influence the regression line, we need to take care of data sensitization before building the regression line.