204.1.4 How good is my Regression Line

Link to the previous post : https://statinfer.com/204-1-3-practice-regression-line-fitting/

In this post we will understand the mathematics behind a good regression line.

Take an (x,y) point from data.
Imagine that we submitted x in the regression line, we got a prediction as $y_{p r e d}$
If the regression line is a good fit then the we expect $y_{p r e d}$ =y or (y- $y_{p r e d}$ ) =0
At every point of x, if we repeat the same, then we will get multiple error values (y- $y_{p r e d}$ ) values

Some of them might be positive, some of them may be negative, so we can take the square of all such errors

S S E = \sum (y - y^) 2

For a good model we need SSE to be zero or near to zero
Standalone SSE will not make any sense, For example SSE= 100, is very less when y is varying in terms of 1000’s. Same value is is very high when y is varying in terms of decimals.
We have to consider variance of y while calculating the regression line accuracy
Error Sum of squares (SSE- Sum of Squares of error) $S S E = \sum (y - y^) 2$

Total Variance in Y (SST- Sum of Squares of Total) $S S T = \sum (y - y ¯) 2$ $S S T = \sum (y - y^+ - y^- y ¯) 2$ $S S T = \sum (y - y^+ - y^- y ¯) 2$ $S S T = \sum (y - y^) 2 + \sum (y^- y ¯) 2$ $S S T = S S E + \sum (y^- y ¯) 2$ $S S T = S S E + S S R$

So, total variance in Y is divided into two parts,
- Variance that can’t be explained by x (error)
- Variance that can be explained by x, using regression

Total variance in Y is divided into two parts,
- Variance that can be explained by x, using regression
- Variance that can’t be explained by x $S S T = S S E + S S R$ $T o t a l s u m o f S q u a r e s = S u m o f S q u a r e s E r r o r + S u m o f S q u a r e s R e g r e s s i o n$ $S S T = \sum (y - y ¯) 2 S S E = \sum (y - y^) 2 S S R = \sum (y^- y ¯) 2$

In next session we will figure out R–squared which a statistical measure of closeness of datapoints to the fitted regression line.

The next post is about R squared in python.

Link to the next post : https://statinfer.com/204-1-5-r-squared-in-python/

21st June 2017