Link to the previous post : https://statinfer.com/104-3-2-descriptive-statistics-mean-and-median/
In the previous post we tried understanding descriptive Statistics. In this post we will understand Dispersion Measures and implement them using python.
This post is the extension of previous posts, we will be going forward with previously imported data from 104.3.2 and 104.3.1.
Dispersion Measures : Variance and Standard Deviation
Dispersion
- Just knowing the central tendency is not enough.
- Two variables might have same mean, but they might be very different.
- Look at these two variables. Profit details of two companies A & B for last 14 Quarters in MMs
Company A | Company B |
---|---|
43 | 17 |
44 | 15 |
0 | 12 |
25 | 17 |
20 | 15 |
35 | 18 |
-8 | 12 |
13 | 15 |
-10 | 12 |
-8 | 13 |
32 | 18 |
11 | 18 |
-8 | 14 |
21 | 14 |
- Though the average profit is 15 in both the cases
- Company B has performed consistently than company A.
- There was even loses for company A
- Measures of dispersion become very vital in such cases
Variance and Standard deviation
- Dispersion is the quantification of deviation of each point from the mean value.
- Variance is average of squared distances of each point from the mean
- Variance is a fairly good measure of dispersion.
- Variance in profit for company A is 352 and Company B is 4.9
σ2=∑ni=1(xi−x¯)2n
Variance Calculation
Value | Value – Mean | (Value – Mean)^2 |
---|---|---|
43 | 28 | 784 |
44 | 29 | 841 |
0 | -15 | 225 |
25 | 10 | 100 |
20 | 5 | 25 |
35 | 20 | 400 |
-8 | -23 | 529 |
13 | -2 | 4 |
-10 | -25 | 625 |
-8 | -23 | 529 |
32 | 17 | 289 |
11 | -4 | 16 |
-8 | -23 | 529 |
21 | 6 | 36 |
15 | 352 |
Value | Value – Mean | (Value – Mean)^2 |
---|---|---|
17 | 2 | 4 |
15 | 0 | 0 |
12 | -3 | 9 |
15 | 0 | 0 |
18 | 3 | 9 |
12 | -3 | 9 |
15 | 0 | 0 |
12 | -3 | 9 |
13 | -2 | 4 |
18 | 3 | 9 |
18 | 3 | 9 |
14 | -1 | 1 |
14 | -1 | 1 |
21 | 6 | 36 |
15 | 4.9 |
Standard Deviation
- Standard deviation is just the square root of variance
- Variance gives a good idea on dispersion, but it is of the order of squares.
- Its very clear from the formula, variance unites are squared than that of original data.
- Standard deviation is the variance measure that is in the same units as the original data
s=∑ni=1(xi−x¯)2n−−−−−−−−−−−−√
Variance and Standard deviation on Python
- Divide the Income data into two sets. USA vs Others
- Find the variance of “education.num” in those two sets. Which one has higher variance?
In [12]:
usa_income=Income_Data[Income_Data["native-country"]==' United-States']
usa_income.shape
Out[12]:
In [13]:
other_income=Income_Data[Income_Data["native-country"]!=' United-States']
other_income.shape
Out[13]:
- Variance and SD for USA
In [14]:
var_usa=usa_income["education-num"].var()
var_usa
Out[14]:
In [15]:
std_usa=usa_income["education-num"].std()
std_usa
Out[15]:
In [16]:
var_other=other_income["education-num"].var()
var_other
Out[16]:
In [17]:
std_other=other_income["education-num"].std()
std_other
Out[17]:
Practice : Variance and Standard deviation
- Dataset: “./Online Retail Sales Data/Online Retail.csv”
- What is the variance and s.d of “UnitPrice”
- What is the variance and s.d of “Quantity”
- Which one these two variables is consistent?
In [18]:
var_UnitPrice=Retail['UnitPrice'].var()
var_UnitPrice
Out[18]:
In [19]:
std_UnitPrice=Retail['UnitPrice'].std()
std_UnitPrice
Out[19]:
In [20]:
var_quantity=Retail['Quantity'].var()
var_quantity
Out[20]:
In [21]:
std_quantity=Retail['Quantity'].std()
std_quantity
Out[21]: