• No products in the cart.

# 104.3.3 Dispersion Measures in Python

##### Variance and Standard Deviation
In the previous post we tried understanding descriptive Statistics. In this post we will understand Dispersion Measures and implement them using python.
This post is the extension of previous posts, we will be going forward with previously imported data  from 104.3.2 and 104.3.1.

## Dispersion Measures : Variance and Standard Deviation

### Dispersion

• Just knowing the central tendency is not enough.
• Two variables might have same mean, but they might be very different.
• Look at these two variables. Profit details of two companies A & B for last 14 Quarters in MMs
Company A Company B
43 17
44 15
0 12
25 17
20 15
35 18
-8 12
13 15
-10 12
-8 13
32 18
11 18
-8 14
21 14
• Though the average profit is 15 in both the cases
• Company B has performed consistently than company A.
• There was even loses for company A
• Measures of dispersion become very vital in such cases

### Variance and Standard deviation

• Dispersion is the quantification of deviation of each point from the mean value.
• Variance is average of squared distances of each point from the mean
• Variance is a fairly good measure of dispersion.
• Variance in profit for company A is 352 and Company B is 4.9
σ2=ni=1(xix¯)2n

Variance Calculation

Value Value – Mean (Value – Mean)^2
43 28 784
44 29 841
0 -15 225
25 10 100
20 5 25
35 20 400
-8 -23 529
13 -2 4
-10 -25 625
-8 -23 529
32 17 289
11 -4 16
-8 -23 529
21 6 36
15 352

Value Value – Mean (Value – Mean)^2
17 2 4
15 0 0
12 -3 9
15 0 0
18 3 9
12 -3 9
15 0 0
12 -3 9
13 -2 4
18 3 9
18 3 9
14 -1 1
14 -1 1
21 6 36
15 4.9

### Standard Deviation

• Standard deviation is just the square root of variance
• Variance gives a good idea on dispersion, but it is of the order of squares.
• Its very clear from the formula, variance unites are squared than that of original data.
• Standard deviation is the variance measure that is in the same units as the original data
s=∑ni=1(xi−x¯)2n−−−−−−−−−−−−√

### Variance and Standard deviation on Python

• Divide the Income data into two sets. USA vs Others
• Find the variance of “education.num” in those two sets. Which one has higher variance?
In :
usa_income=Income_Data[Income_Data["native-country"]==' United-States']
usa_income.shape

Out:
(29170, 15)
In :
other_income=Income_Data[Income_Data["native-country"]!=' United-States']
other_income.shape

Out:
(3391, 15)
• Variance and SD for USA
In :
var_usa=usa_income["education-num"].var()
var_usa

Out:
5.735862879538104
In :
std_usa=usa_income["education-num"].std()
std_usa

Out:
2.394966154152936
In :
var_other=other_income["education-num"].var()
var_other

Out:
13.567613037808737
In :
std_other=other_income["education-num"].std()
std_other

Out:
3.6834240914954033

### Practice : Variance and Standard deviation

• Dataset: “./Online Retail Sales Data/Online Retail.csv”
• What is the variance and s.d of “UnitPrice”
• What is the variance and s.d of “Quantity”
• Which one these two variables is consistent?
In :
var_UnitPrice=Retail['UnitPrice'].var()
var_UnitPrice

Out:
9362.469164424467
In :
std_UnitPrice=Retail['UnitPrice'].std()
std_UnitPrice

Out:
96.75985306119716
In :
var_quantity=Retail['Quantity'].var()
var_quantity

Out:
47559.39140913822
In :
std_quantity=Retail['Quantity'].std()
std_quantity

Out:
218.08115784986612

The next post is about percentiles and quartiles in python.
Link to the next post : https://statinfer.com/104-3-4-percentiles-quartiles-in-python/
9th April 2019
0 responses on "104.3.3 Dispersion Measures in Python"