R-Squared
In the previous section, we studied about how good is my Regression Line in R
For a good model SSE should be really low. Another way to look at this would be , the total variance in Y is SST=SEE + SSR , we need to make sure that for really good model SSE should be minimized ,SSR should be maximum . . SSE/SST should be always minimum. . SSR/SST should always be maximum. The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable. The coefficient of determination is also called R-squared and is denoted as R2
where 0<= R2
<=1
R squared is the ratio of the regression sum of squares to the total sum of squares; this is also known as explained variance. For a good model explained variance should be near to 100%, the explained variance is 50%, then it’s not so good model ,this how we can distinguish between a good model and a bad model. For a given model we can look at the value of R squared to tell the given model is good or bad.
Lab: R- Square
So let’s do a lab assignment on R squared model. Previously we have fit the regression line between the numbers of passengers and promotional budget, let’s try to find out the R squared value. So first go to the model object which we have created previously and first run the summary of the model.
- What is the R-square value of Passengers vs Promotion_Budget model?
- What is the R-square value of Passengers vs Inter_metro_flight_ratio
Solution
- What is the R-square value of Passengers vs Promotion_Budget model?
summary(model1)
##
## Call:
## lm(formula = air$Passengers ~ air$Promotion_Budget, data = air)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5037.6 -2348.6 148.1 2569.3 4851.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.260e+03 1.361e+03 0.925 0.358
## air$Promotion_Budget 6.953e-02 2.112e-03 32.923 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2938 on 78 degrees of freedom
## Multiple R-squared: 0.9329, Adjusted R-squared: 0.932
## F-statistic: 1084 on 1 and 78 DF, p-value: < 2.2e-16
Inside the summary of the model we can find the line that explains the value of R squared. R2
is 0.9329
- What is the R-square value of Passengers vs Inter_metro_flight_ratio
summary(model2)
##
## Call:
## lm(formula = air$Passengers ~ air$Inter_metro_flight_ratio, data = air)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18199 -6815 -1409 4848 29223
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20441 4994 4.093 0.000103 ***
## air$Inter_metro_flight_ratio 35071 7028 4.990 3.58e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9872 on 78 degrees of freedom
## Multiple R-squared: 0.242, Adjusted R-squared: 0.2323
## F-statistic: 24.9 on 1 and 78 DF, p-value: 3.579e-06
R2
Is 0.242 So let’s now compare this present model with a new model. In this new model we will predict the numbers of passengers using different variable called inter metro flight ratio, if there are more flights between metro cities do you think the number of passengers will depend on that ratio or not. For this we will build a model and check the R squared value for this particular model. The value of R squared is only 25%, this means only 25% of the variation can be explained by the inter metro flight ratio. So to predict the number of passenger promotional budget is the better variable.
The next post is about Multiple Regression in R.