Link to the previous post : https://statinfer.com/104-3-2-descriptive-statistics-mean-and-median/

In the previous post we tried understanding descriptive Statistics. In this post we will understand Dispersion Measures and implement them using python.

This post is the extension of previous posts, we will be going forward with previously imported data from 104.3.2 and 104.3.1.

- Just knowing the central tendency is not enough.
- Two variables might have same mean, but they might be very different.
- Look at these two variables. Profit details of two companies A & B for last 14 Quarters in MMs

Company A | Company B |
---|---|

43 | 17 |

44 | 15 |

0 | 12 |

25 | 17 |

20 | 15 |

35 | 18 |

-8 | 12 |

13 | 15 |

-10 | 12 |

-8 | 13 |

32 | 18 |

11 | 18 |

-8 | 14 |

21 | 14 |

- Though the average profit is 15 in both the cases
- Company B has performed consistently than company A.
- There was even loses for company A
- Measures of dispersion become very vital in such cases

- Dispersion is the quantification of deviation of each point from the mean value.
- Variance is average of squared distances of each point from the mean
- Variance is a fairly good measure of dispersion.
- Variance in profit for company A is 352 and Company B is 4.9

σ2=∑ni=1(xi−x¯)2n

**Variance Calculation**

Value | Value – Mean | (Value – Mean)^2 |
---|---|---|

43 | 28 | 784 |

44 | 29 | 841 |

0 | -15 | 225 |

25 | 10 | 100 |

20 | 5 | 25 |

35 | 20 | 400 |

-8 | -23 | 529 |

13 | -2 | 4 |

-10 | -25 | 625 |

-8 | -23 | 529 |

32 | 17 | 289 |

11 | -4 | 16 |

-8 | -23 | 529 |

21 | 6 | 36 |

15 |
352 |

Value | Value – Mean | (Value – Mean)^2 |
---|---|---|

17 | 2 | 4 |

15 | 0 | 0 |

12 | -3 | 9 |

15 | 0 | 0 |

18 | 3 | 9 |

12 | -3 | 9 |

15 | 0 | 0 |

12 | -3 | 9 |

13 | -2 | 4 |

18 | 3 | 9 |

18 | 3 | 9 |

14 | -1 | 1 |

14 | -1 | 1 |

21 | 6 | 36 |

15 |
4.9 |

- Standard deviation is just the square root of variance
- Variance gives a good idea on dispersion, but it is of the order of squares.
- Its very clear from the formula, variance unites are squared than that of original data.
- Standard deviation is the variance measure that is in the same units as the original data

`s=∑ni=1(xi−x¯)2n−−−−−−−−−−−−√`

- Divide the Income data into two sets. USA vs Others
- Find the variance of “education.num” in those two sets. Which one has higher variance?

In [12]:

```
usa_income=Income_Data[Income_Data["native-country"]==' United-States']
usa_income.shape
```

Out[12]:

In [13]:

```
other_income=Income_Data[Income_Data["native-country"]!=' United-States']
other_income.shape
```

Out[13]:

- Variance and SD for USA

In [14]:

```
var_usa=usa_income["education-num"].var()
var_usa
```

Out[14]:

In [15]:

```
std_usa=usa_income["education-num"].std()
std_usa
```

Out[15]:

In [16]:

```
var_other=other_income["education-num"].var()
var_other
```

Out[16]:

In [17]:

```
std_other=other_income["education-num"].std()
std_other
```

Out[17]:

- Dataset: “./Online Retail Sales Data/Online Retail.csv”
- What is the variance and s.d of “UnitPrice”
- What is the variance and s.d of “Quantity”
- Which one these two variables is consistent?

In [18]:

```
var_UnitPrice=Retail['UnitPrice'].var()
var_UnitPrice
```

Out[18]:

In [19]:

```
std_UnitPrice=Retail['UnitPrice'].std()
std_UnitPrice
```

Out[19]:

In [20]:

```
var_quantity=Retail['Quantity'].var()
var_quantity
```

Out[20]:

In [21]:

```
std_quantity=Retail['Quantity'].std()
std_quantity
```

Out[21]: