• No products in the cart.

# 203.1.9 Adjusted R-squared in R

##### R square for multiple variables in regression.

In previous section, we studied about Multiple Regression in R

In the regression output summary, you might have already observed the adjusted r-squared field. It is good to have as many as many variables in the data, but it’s not recommended to keep on increasing the numbers of variables while building the model , in the hope that R-squared value will also be increasing simultaneously . In fact the R – squared value will never decrease if we keep on adding the variables it slightly increases or it stays same. For example, if we build a model using 10 variables with R-squared value is 80% if more 10 variable are added the value of R-squared will never go below 80% either it will stay at 80 or increase slightly. Suppose some the junk variables are added in those extra 10 variables then also R-squared value won’t go below 80%.

R¯2=R2k1nk(1R2)

where n – number of observations and k – number of parameters

To understand the concept of adjusted R square, we will use an example. Build a model to predict y using x1, x2 and x3. Note down R-Square and Adj R-Square values. Build a model to predict y using x1, x2, x3, x4, x5 and x6. Note down R-Square and Adj R-Square values. Build a model to predict y using x1, x2, x3, x4, x5, x6, x7 and x8. Note down R-Square and Adj R-Square values. Load the dataset into the R by using the R commands. Then build the model 1 named as m1.

adj_sample = read.csv("R dataset\\Adjusted RSquare\\Adj_Sample.csv")

m1<-lm(Y~x1+x2)
m2<-lm(Y~x1+x2+x3+x4+x5+x6)
m3<-lm(Y~x1+x2+x3+x4+x5+x6+x7+x8)

summary(m1)
summary(m2)
summary(m3)

detach(adj_sample)

#### Output m1

##
## Call:
## lm(formula = Y ~ x1 + x2)
##
## Residuals:
##     Min      1Q  Median      3Q     Max
## -2.0050 -0.2381  0.1893  0.4254  1.2321
##
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.015677   1.169583  -0.868    0.408
## x1           0.279345   0.282874   0.988    0.349
## x2           0.001882   0.001328   1.417    0.190
##
## Residual standard error: 0.9043 on 9 degrees of freedom
## Multiple R-squared:  0.419,  Adjusted R-squared:  0.2899
## F-statistic: 3.245 on 2 and 9 DF,  p-value: 0.08685

If you look at summary m1 model m1 has an R square of 68% and adjusted R square of 56%. So look at the model m1’s predictor variable’s p-value, there are 3 variables out of which one 1 is non-impactful that’s the variable x1 and remaining two are slightly impactful.

#### Output m2

## The following objects are masked from adj_sample (pos = 3):
##
##     x1, x2, x3, x4, x5, x6, x7, x8, Y
##
## Call:
## lm(formula = Y ~ x1 + x2 + x3 + x4 + x5 + x6)
##
## Residuals:
##        1        2        3        4        5        6        7        8
##  0.25902  0.06800  0.45286  0.62004 -1.13449 -0.53961 -0.41898  0.52544
##        9       10       11       12
## -0.36028 -0.04814  0.83404 -0.25789
##
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.375099   4.686803  -1.147   0.3033
## x1          -0.669681   0.536981  -1.247   0.2676
## x2           0.002969   0.001518   1.956   0.1079
## x3           0.506261   0.248695   2.036   0.0974 .
## x4           0.037611   0.083834   0.449   0.6725
## x5           0.043624   0.168830   0.258   0.8064
## x6           0.051554   0.087708   0.588   0.5822
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8468 on 5 degrees of freedom
## Multiple R-squared:  0.7169, Adjusted R-squared:  0.3773
## F-statistic: 2.111 on 6 and 5 DF,  p-value: 0.2149

In the summary of m2 model the R-squared increased from 68% to 71% whereas Adjusted R-squared dropped from 56% to 37%, this is because in this model m2 there are many non-impactful variables and in the presence of too many non-impactful variable the effect of impact-full variables have gone down resulting in the decreased value of Adjusted -r-squared.

#### Output m3

## The following objects are masked from adj_sample (pos = 3):
##
##     x1, x2, x3, x4, x5, x6, x7, x8, Y
## The following objects are masked from adj_sample (pos = 4):
##
##     x1, x2, x3, x4, x5, x6, x7, x8, Y
##
## Call:
## lm(formula = Y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8)
##
## Residuals:
##       1       2       3       4       5       6       7       8       9
##  0.4989  0.4490 -0.1764  0.3267 -0.8213 -0.6679 -0.2299  0.2323 -0.2973
##      10      11      12
##  0.3333  0.6184 -0.2658
##
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.0439629 19.9031715   0.856    0.455
## x1          -0.0955943  0.7614799  -0.126    0.908
## x2           0.0007376  0.0025362   0.291    0.790
## x3           0.5157015  0.3062833   1.684    0.191
## x4           0.0578632  0.1033356   0.560    0.615
## x5           0.0858136  0.1914803   0.448    0.684
## x6          -0.1746565  0.2197152  -0.795    0.485
## x7          -0.0323678  0.1530067  -0.212    0.846
## x8          -0.2321183  0.2065655  -1.124    0.343
##
## Residual standard error: 0.9071 on 3 degrees of freedom
## Multiple R-squared:  0.8051, Adjusted R-squared:  0.2855
## F-statistic: 1.549 on 8 and 3 DF,  p-value: 0.3927

In the summary of m3 model the R-squared increased from 71% to 80% whereas Adjusted R-squared dropped from 37% to 28%, this happens only when you are trying to add more predicting variables which are not even related to the target variable.

If the values of R-squared and Adjusted-R-squared are nearby this means that there is no junk variables in the data. That means all the variables are impactful or the entire predicting variables that we are considering for building the model from, is impacting target variable in a significant way. If the difference is too high between the values of r squared and adjusted r squared then we can conclude it as there are some variables in the data which are not useful for this particular model.

The next post i a practice session on Multiple Regression Issues.

### 0 responses on "203.1.9 Adjusted R-squared in R"

Statinfer Software Solutions LLP

Software Technology Parks of India,
NH16, Krishna Nagar, Benz Circle,