Link to the previous post: https://statinfer.com/104-3-4-percentiles-quartiles-in-python/
In this post, we will discuss a basics or boxplots and how they help us identify outliers.
We will be carrying same python session form series 104 blog posts, i.e. same datasets.
Box plots and Outlier Detection
- Box plots have box from LQ to UQ, with median marked.
- They portray a five-number graphical summary of the data Minimum, LQ, Median, UQ, Maximum
- Helps us to get an idea on the data distribution
- Helps us to identify the outliers easily
- 25% of the population is below first quartile,
- 75% of the population is below third quartile
- If the box is pushed to one side and some values are far away from the box then it’s a clear indication of outliers
- Some set of values far away from box, gives us a clear indication of outliers.
- In this example the minimum is 5, maximum is 120, and 75% of the values are less than 15.
- Still there are some records reaching 120. Hence a clear indication of outliers.
- Sometimes the outliers are so evident that, the box appear to be a horizontal line in box plot.
Box plots and outlier detection on Python
In [30]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.boxplot(bank.balance)
Out[30]:
Practice: Box plots and outlier detection
- Dataset: “./Bank Marketing/bank_market.csv”
- Draw a box plot for balance variable
- Do you suspect any outliers in balance ?
- Get relevant percentiles and see their distribution.
- Draw a box plot for age variable
- Do you suspect any outliers in age?
- Get relevant percentiles and see their distribution.
In [31]:
plt.boxplot(bank.balance)
Out[31]:
outlier are present in balance variable
In [32]:
#Get relevant percentiles and see their distribution
bank['balance'].quantile([0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1])
Out[32]:
In [33]:
# Draw a box plot for age variable
plt.boxplot(bank.age)
Out[33]:
No outliers are present
In [34]:
#Get relevant percentiles and see their distribution
bank['age'].quantile([0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1])
Out[34]:
Next post is about creating graphs in python.
Link to the next post :https://statinfer.com/104-3-6-creating-graphs-in-python/