• No products in the cart.

# 104.3.2 Descriptive Statistics : Mean and Median

##### Basics of descriptive statistic.

Link to the previous post :  https://statinfer.com/104-3-1-data-sampling-in-python/

Descriptive statistics

• The basic descriptive statistics to give us an idea on the variables and their distributions.
• Permit the analyst to describe many pieces of data with a few indices.
• Can also help us find the underlying outliers in the dataset which is important before cleaning the data.
• Central tendencies:
• Mean
• Median
• Dispersion:
• Range
• Variance
• Standard deviation

### Central Tendencies

• Mean
• The arithmetic mean
• Sum of values / Count of values
• Gives a quick idea on average of a variable
• Median
• Mean is not a good measure in presence of outliers
• For example Consider below data vector
• 1.5,1.7,1.9,0.8,0.8,1.2,1.9,1.4, 9 , 0.7 , 1.1
• 90% of the above values are less than 2, but the mean of above vector is 2
• There is an unusual value in the above data vector i.e 9
• It is also known as outlier.
• Mean is not the true middle value in presence of outliers. Mean is very much effected by the outliers.
• We use median, the true middle value in such cases
• Sort the data either in ascending or descending order
Numbers Sorted Numbers
1.5 0.7
1.7 0.8
1.9 0.8
0.8 1.1
0.8 1.2
1.2 1.4
1.9 1.5
1.4 1.7
9 1.9
0.7 1.9
1.1 9
• Mean of the data is 2
• Median of the data is 1.4
• Even if we have the outlier as 90, we will have the same median
• Median is a positional measure, it doesn’t really depend on outliers
• When there are no outliers then mean and median will be nearly equal
• When mean is not equal to median it gives us an idea on presence of outliers in the data

### Mean and Median on Python

In [5]:
gain_mean=Income_Data["capital-gain"].mean()
gain_mean

Out[5]:
1077.6488437087312
In [6]:
gain_median=Income_Data["capital-gain"].median()
gain_median

Out[6]:
0.0

Mean is far away from median. Looks like there are outliers, we need to look at percentiles and box plot.

### Practice : Mean and Median on Python

• Dataset: “./Online Retail Sales Data/Online Retail.csv”
• What is the mean of “UnitPrice”
• What is the median of “UnitPrice”
• Is mean equal to median? Do you suspect the presence of outliers in the data?
• What is the mean of “Quantity”
• What is the median of “Quantity”
• Is mean equal to median? Do you suspect the presence of outliers in the data?
In [7]:
Retail=pd.read_csv("datasets\\Online Retail Sales Data\\Online Retail.csv", encoding = "ISO-8859-1")
Retail.shape

Out[7]:
(541909, 8)
In [8]:
UnitPrice_mean=Retail["UnitPrice"].mean()
UnitPrice_mean

Out[8]:
4.611113626083471
In [9]:
UnitPrice_median=Retail["UnitPrice"].median()
UnitPrice_median

Out[9]:
2.08
In [10]:
UnitPrice_mean=Retail["Quantity"].mean()
UnitPrice_mean

Out[10]:
9.55224954743324
In [11]:
UnitPrice_median=Retail["Quantity"].median()
UnitPrice_median

Out[11]:
3.0

Yes, looks like we have outliers presents in this variable.

The next post is on dispersion measures in python.

Link to the next post : https://statinfer.com/104-3-3-dispersion-measures-in-python/

### 0 responses on "104.3.2 Descriptive Statistics : Mean and Median"

Statinfer Software Solutions LLP

Software Technology Parks of India,
NH16, Krishna Nagar, Benz Circle,