• No products in the cart.

104.2.6 Sorting the data in python

Sorting the dataset using pandas in python

Link to the previous post : https://statinfer.com/104-2-5-subsetting-data-with-variable-filter-condition-in-python/

In the previous post we created subsets of data by condition filtering, in this post we will create the new subsets by sorting one or more column values.

Sorting the data

We will use Online retail dataset.

In [10]:
Online_Retail=pd.read_csv("datasets\\Online Retail Sales Data\\Online Retail.csv", encoding = "ISO-8859-1")
Online_Retail.head(5)
Out[10]:
InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country
0 536365 85123A WHITE HANGING HEART T-LIGHT HOLDER 6 12/1/2010 8:26 2.55 17850.0 United Kingdom
1 536365 71053 WHITE METAL LANTERN 6 12/1/2010 8:26 3.39 17850.0 United Kingdom
2 536365 84406B CREAM CUPID HEARTS COAT HANGER 8 12/1/2010 8:26 2.75 17850.0 United Kingdom
3 536365 84029G KNITTED UNION FLAG HOT WATER BOTTLE 6 12/1/2010 8:26 3.39 17850.0 United Kingdom
4 536365 84029E RED WOOLLY HOTTIE WHITE HEART. 6 12/1/2010 8:26 3.39 17850.0 United Kingdom
In [78]:
Online_Retail.columns.values
Out[78]:
array(['InvoiceNo', 'StockCode', 'Description', 'Quantity', 'InvoiceDate',
       'UnitPrice', 'CustomerID', 'Country'], dtype=object)
  • Its ascending by default
In [11]:
Online_Retail_sort=Online_Retail.sort('UnitPrice')
Online_Retail_sort.head(5)
Out[11]:
InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country
299984 A563187 B Adjust bad debt 1 8/12/2011 14:52 -11062.06 NaN United Kingdom
299983 A563186 B Adjust bad debt 1 8/12/2011 14:51 -11062.06 NaN United Kingdom
40984 539750 22652 TRAVEL SEWING KIT 1 12/21/2010 15:40 0.00 NaN United Kingdom
52217 540696 84562A NaN 1 1/11/2011 9:14 0.00 NaN United Kingdom
52262 540699 POST NaN 1000 1/11/2011 9:32 0.00 NaN United Kingdom
  • we can use ascending=False for descending order
In [12]:
Online_Retail_sort=Online_Retail.sort('UnitPrice',ascending=False)
Online_Retail_sort.head(5)

Out[12]:
InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country
222681 C556445 M Manual -1 6/10/2011 15:31 38970.00 15098.0 United Kingdom
524602 C580605 AMAZONFEE AMAZON FEE -1 12/5/2011 11:36 17836.46 NaN United Kingdom
43702 C540117 AMAZONFEE AMAZON FEE -1 1/5/2011 9:55 16888.02 NaN United Kingdom
43703 C540118 AMAZONFEE AMAZON FEE -1 1/5/2011 9:57 16453.71 NaN United Kingdom
15017 537632 AMAZONFEE AMAZON FEE 1 12/7/2010 15:08 13541.33 NaN United Kingdom

Practice : Sorting the data

  • We will use AutoDataset for this practice which we imported in our previous posts.
  • Sort the dataset based on length.
  • Sort the dataset based on length descending.
In [83]:
auto_data.columns.values
Out[83]:
array([' symboling', ' normalized-losses', ' make', ' fuel-type',
       ' aspiration', ' num-of-doors', ' body-style', ' drive-wheels',
       ' engine-location', ' wheel-base', ' length', ' width', ' height',
       ' curb-weight', ' engine-type', ' num-of-cylinders', ' engine-size',
       ' fuel-system', ' bore', ' stroke', ' compression-ratio',
       ' horsepower', ' peak-rpm', ' city-mpg', ' highway-mpg', ' price',
       'area'], dtype=object)
In [84]:
auto_data_sort=auto_data.sort( ' length')
auto_data_sort.head(5)

Out[84]:
symboling normalized-losses make fuel-type aspiration num-of-doors body-style drive-wheels engine-location wheel-base fuel-system bore stroke compression-ratio horsepower peak-rpm city-mpg highway-mpg price area
18 2 121 chevrolet gas std two hatchback fwd front 88.4 2bbl 2.91 3.03 9.5 48 5100 47 53 5151.0 452643.156
30 2 137 honda gas std two hatchback fwd front 86.6 1bbl 2.91 3.41 9.6 58 4800 49 54 6479.0 469388.952
31 2 137 honda gas std two hatchback fwd front 86.6 1bbl 2.91 3.41 9.2 76 6000 31 38 6855.0 469388.952
34 1 101 honda gas std two hatchback fwd front 93.7 1bbl 2.91 3.41 9.2 76 6000 30 34 7129.0 504960.000
33 1 101 honda gas std two hatchback fwd front 93.7 1bbl 2.91 3.41 9.2 76 6000 30 34 6529.0 504960.000

5 rows × 27 columns

In [85]:
auto_data_sort1=auto_data.sort( ' length',ascending=False)
auto_data_sort1.head(5)

Out[85]:
symboling normalized-losses make fuel-type aspiration num-of-doors body-style drive-wheels engine-location wheel-base fuel-system bore stroke compression-ratio horsepower peak-rpm city-mpg highway-mpg price area
73 0 ? mercedes-benz gas std four sedan rwd front 120.9 mpfi 3.8 3.35 8.0 184 4500 14 16 40960.0 846007.659
70 -1 93 mercedes-benz diesel turbo four sedan rwd front 115.6 idi 3.58 3.64 21.5 123 4350 22 25 31600.0 817837.446
71 -1 ? mercedes-benz gas std four sedan rwd front 115.6 mpfi 3.46 3.1 8.3 155 4750 16 18 34184.0 820742.730
48 0 ? jaguar gas std four sedan rwd front 113.0 mpfi 3.63 4.17 8.1 176 4750 15 19 35550.0 733506.048
47 0 145 jaguar gas std four sedan rwd front 113.0 mpfi 3.63 4.17 8.1 176 4750 15 19 32250.0 733506.048

5 rows × 27 columns

In next post we will learn how to identify and remove duplicates in python.

Link to the next post : https://statinfer.com/104-2-7-identifying-and-removing-duplicate-values-from-dataset-in-python/

Statinfer

Statinfer derived from Statistical inference. We provide training in various Data Analytics and Data Science courses and assist candidates in securing placements.

Contact Us

info@statinfer.com

+91- 9676098897

+91- 9494762485

 

Our Social Links

top
© 2020. All Rights Reserved.