• No products in the cart.

# 104.3.2 Descriptive Statistics : Mean and Median

##### Basics of descriptive statistic.

Link to the previous post :  https://test.statinfer.com/104-3-1-data-sampling-in-python/

Descriptive statistics

• The basic descriptive statistics to give us an idea on the variables and their distributions.
• Permit the analyst to describe many pieces of data with a few indices.
• Can also help us find the underlying outliers in the dataset which is important before cleaning the data.
• Central tendencies:
• Mean
• Median
• Dispersion:
• Range
• Variance
• Standard deviation

### Central Tendencies

• Mean
• The arithmetic mean
• Sum of values / Count of values
• Gives a quick idea on average of a variable
• Median
• Mean is not a good measure in presence of outliers
• For example Consider below data vector
• 1.5,1.7,1.9,0.8,0.8,1.2,1.9,1.4, 9 , 0.7 , 1.1
• 90% of the above values are less than 2, but the mean of above vector is 2
• There is an unusual value in the above data vector i.e 9
• It is also known as outlier.
• Mean is not the true middle value in presence of outliers. Mean is very much effected by the outliers.
• We use median, the true middle value in such cases
• Sort the data either in ascending or descending order
Numbers Sorted Numbers
1.5 0.7
1.7 0.8
1.9 0.8
0.8 1.1
0.8 1.2
1.2 1.4
1.9 1.5
1.4 1.7
9 1.9
0.7 1.9
1.1 9
• Mean of the data is 2
• Median of the data is 1.4
• Even if we have the outlier as 90, we will have the same median
• Median is a positional measure, it doesn’t really depend on outliers
• When there are no outliers then mean and median will be nearly equal
• When mean is not equal to median it gives us an idea on presence of outliers in the data

### Mean and Median on Python

In :
```gain_mean=Income_Data["capital-gain"].mean()
gain_mean
```
Out:
`1077.6488437087312`
In :
```gain_median=Income_Data["capital-gain"].median()
gain_median
```
Out:
`0.0`

Mean is far away from median. Looks like there are outliers, we need to look at percentiles and box plot.

### Practice : Mean and Median on Python

• Dataset: “./Online Retail Sales Data/Online Retail.csv”
• What is the mean of “UnitPrice”
• What is the median of “UnitPrice”
• Is mean equal to median? Do you suspect the presence of outliers in the data?
• What is the mean of “Quantity”
• What is the median of “Quantity”
• Is mean equal to median? Do you suspect the presence of outliers in the data?
In :
```Retail=pd.read_csv("datasets\\Online Retail Sales Data\\Online Retail.csv", encoding = "ISO-8859-1")
Retail.shape
```
Out:
`(541909, 8)`
In :
```UnitPrice_mean=Retail["UnitPrice"].mean()
UnitPrice_mean
```
Out:
`4.611113626083471`
In :
```UnitPrice_median=Retail["UnitPrice"].median()
UnitPrice_median
```
Out:
`2.08`
In :
```UnitPrice_mean=Retail["Quantity"].mean()
UnitPrice_mean
```
Out:
`9.55224954743324`
In :
```UnitPrice_median=Retail["Quantity"].median()
UnitPrice_median
```
Out:
`3.0`

Yes, looks like we have outliers presents in this variable.

The next post is on dispersion measures in python.

Link to the next post : https://test.statinfer.com/104-3-3-dispersion-measures-in-python/