• No products in the cart.

104.3.3 Dispersion Measures in Python

Variance and Standard Deviation
In the previous post we tried understanding descriptive Statistics. In this post we will understand Dispersion Measures and implement them using python.
This post is the extension of previous posts, we will be going forward with previously imported data  from 104.3.2 and 104.3.1.

Dispersion Measures : Variance and Standard Deviation

Dispersion

  • Just knowing the central tendency is not enough.
  • Two variables might have same mean, but they might be very different.
  • Look at these two variables. Profit details of two companies A & B for last 14 Quarters in MMs
Company A Company B
43 17
44 15
0 12
25 17
20 15
35 18
-8 12
13 15
-10 12
-8 13
32 18
11 18
-8 14
21 14
  • Though the average profit is 15 in both the cases
  • Company B has performed consistently than company A.
  • There was even loses for company A
  • Measures of dispersion become very vital in such cases

Variance and Standard deviation

  • Dispersion is the quantification of deviation of each point from the mean value.
  • Variance is average of squared distances of each point from the mean
  • Variance is a fairly good measure of dispersion.
  • Variance in profit for company A is 352 and Company B is 4.9
σ2=ni=1(xix¯)2n

Variance Calculation

Value Value – Mean (Value – Mean)^2
43 28 784
44 29 841
0 -15 225
25 10 100
20 5 25
35 20 400
-8 -23 529
13 -2 4
-10 -25 625
-8 -23 529
32 17 289
11 -4 16
-8 -23 529
21 6 36
15 352

Value Value – Mean (Value – Mean)^2
17 2 4
15 0 0
12 -3 9
15 0 0
18 3 9
12 -3 9
15 0 0
12 -3 9
13 -2 4
18 3 9
18 3 9
14 -1 1
14 -1 1
21 6 36
15 4.9

Standard Deviation

  • Standard deviation is just the square root of variance
  • Variance gives a good idea on dispersion, but it is of the order of squares.
  • Its very clear from the formula, variance unites are squared than that of original data.
  • Standard deviation is the variance measure that is in the same units as the original data
s=ni=1(xix¯)2n−−−−−−−−−−−−√

Variance and Standard deviation on Python

  • Divide the Income data into two sets. USA vs Others
  • Find the variance of “education.num” in those two sets. Which one has higher variance?
In [12]:
usa_income=Income_Data[Income_Data["native-country"]==' United-States']
usa_income.shape
Out[12]:
(29170, 15)
In [13]:
other_income=Income_Data[Income_Data["native-country"]!=' United-States']
other_income.shape
Out[13]:
(3391, 15)
  • Variance and SD for USA
In [14]:
var_usa=usa_income["education-num"].var()
var_usa
Out[14]:
5.735862879538104
In [15]:
std_usa=usa_income["education-num"].std()
std_usa
Out[15]:
2.394966154152936
In [16]:
var_other=other_income["education-num"].var()
var_other
Out[16]:
13.567613037808737
In [17]:
std_other=other_income["education-num"].std()
std_other 
Out[17]:
3.6834240914954033

Practice : Variance and Standard deviation

  • Dataset: “./Online Retail Sales Data/Online Retail.csv”
  • What is the variance and s.d of “UnitPrice”
  • What is the variance and s.d of “Quantity”
  • Which one these two variables is consistent?
In [18]:
var_UnitPrice=Retail['UnitPrice'].var()
var_UnitPrice
Out[18]:
9362.469164424467
In [19]:
std_UnitPrice=Retail['UnitPrice'].std()
std_UnitPrice 
Out[19]:
96.75985306119716
In [20]:
var_quantity=Retail['Quantity'].var()
var_quantity
Out[20]:
47559.39140913822
In [21]:
std_quantity=Retail['Quantity'].std()
std_quantity
Out[21]:
218.08115784986612


The next post is about percentiles and quartiles in python.
Link to the next post : https://statinfer.com/104-3-4-percentiles-quartiles-in-python/

Statinfer

Statinfer derived from Statistical inference. We provide training in various Data Analytics and Data Science courses and assist candidates in securing placements.

Contact Us

info@statinfer.com

+91- 9676098897

+91- 9494762485

 

Our Social Links

top
© 2020. All Rights Reserved.