Link to the previous post : https://statinfer.com/104-3-1-data-sampling-in-python/
Descriptive statistics
Numbers | Sorted Numbers |
---|---|
1.5 | 0.7 |
1.7 | 0.8 |
1.9 | 0.8 |
0.8 | 1.1 |
0.8 | 1.2 |
1.2 | 1.4 |
1.9 | 1.5 |
1.4 | 1.7 |
9 | 1.9 |
0.7 | 1.9 |
1.1 | 9 |
gain_mean=Income_Data["capital-gain"].mean()
gain_mean
gain_median=Income_Data["capital-gain"].median()
gain_median
Mean is far away from median. Looks like there are outliers, we need to look at percentiles and box plot.
Retail=pd.read_csv("datasets\\Online Retail Sales Data\\Online Retail.csv", encoding = "ISO-8859-1")
Retail.shape
UnitPrice_mean=Retail["UnitPrice"].mean()
UnitPrice_mean
UnitPrice_median=Retail["UnitPrice"].median()
UnitPrice_median
UnitPrice_mean=Retail["Quantity"].mean()
UnitPrice_mean
UnitPrice_median=Retail["Quantity"].median()
UnitPrice_median
Yes, looks like we have outliers presents in this variable.
The next post is on dispersion measures in python.
Link to the next post : https://statinfer.com/104-3-3-dispersion-measures-in-python/