Statinfer

103.2.5 Sorting of Data

In a previous post, we saw  Calculated Fields in R

Sorting of the data can be considered as the fundamental part of the Data Analysis. The user might want to sort the Names in the Alphabetical order, or wants to sort the Income data in the ascending order to find the highest taxpayers, etc. Sorting is helpful in managing the data in ascending or descending order. Like for example, A bank wants to find the highest taxpayers in a city. For this, they need to sort the data on the basis of the Income in the Descending order, and hence select the top taxpayers from the list. In R, there is an inbuilt function order() to sort the data in ascending or descending order. The syntax for sorting the data is

>Newdata <- Olddata[order(variables),]

Consider the following examples

  • Sorting the data in Ascending order
>Online.Retail_sort<-Online_Retail[order(Online_Retail$UnitPrice),]
>head(Online.Retail_sort)

 

##        InvoiceNo StockCode     Description Quantity     InvoiceDate
## 299984   A563186         B Adjust bad debt        1 8/12/2011 14:51
## 299985   A563187         B Adjust bad debt        1 8/12/2011 14:52
## 623       536414     22139                       56 12/1/2010 11:52
## 1971      536545     21134                        1 12/1/2010 14:32
## 1972      536546     22145                        1 12/1/2010 14:33
## 1973      536547     37509                        1 12/1/2010 14:33
##        UnitPrice CustomerID        Country Quantity_indicator Price_Class
## 299984 -11062.06         NA United Kingdom                Low         Low
## 299985 -11062.06         NA United Kingdom                Low         Low
## 623         0.00         NA United Kingdom                Low         Low
## 1971        0.00         NA United Kingdom                Low         Low
## 1972        0.00         NA United Kingdom                Low         Low
## 1973        0.00         NA United Kingdom                Low         Low

The code above would sort the data in UnitPrice in ascending order and store the sorted data set into Online.Retail_sort as shown in the output. To sort the data in descending order, the negative sign is added in front of the variable which is to be sorted.

>Online.Retail_sort1<-Online_Retail[order(-Online_Retail$UnitPrice),]
>head(Online.Retail_sort1)

 

##        InvoiceNo StockCode Description Quantity     InvoiceDate UnitPrice
## 222682   C556445         M      Manual       -1 6/10/2011 15:31  38970.00
## 524603   C580605 AMAZONFEE  AMAZON FEE       -1 12/5/2011 11:36  17836.46
## 43703    C540117 AMAZONFEE  AMAZON FEE       -1   1/5/2011 9:55  16888.02
## 43704    C540118 AMAZONFEE  AMAZON FEE       -1   1/5/2011 9:57  16453.71
## 15017    C537630 AMAZONFEE  AMAZON FEE       -1 12/7/2010 15:04  13541.33
## 15018     537632 AMAZONFEE  AMAZON FEE        1 12/7/2010 15:08  13541.33
##        CustomerID        Country Quantity_indicator Price_Class
## 222682      15098 United Kingdom                Low        High
## 524603         NA United Kingdom                Low        High
## 43703          NA United Kingdom                Low        High
## 43704          NA United Kingdom                Low        High
## 15017          NA United Kingdom                Low        High
## 15018          NA United Kingdom                Low        High

The code above would sort the data UnitPrice in descending order and store it is new variable Retail_sort1.

Sorting based on multiple variables

Sorting in the data set can also be done with two variables simultaneously.

As in the code below, the Country Name is sorted in ascending(alphabetical) order and within country, Quantity is sorted in descending(numeric) order and is being stored in Online.Retail_Sort2, the output of which can be seen below.

>Online.Retail_sort2<-Online_Retail[order(Online_Retail$Country, -Online_Retail$Quantity),]
>head(Online.Retail_sort2)

 

##        InvoiceNo StockCode Description Quantity     InvoiceDate UnitPrice
## 222682   C556445         M      Manual       -1 6/10/2011 15:31  38970.00
## 524603   C580605 AMAZONFEE  AMAZON FEE       -1 12/5/2011 11:36  17836.46
## 43703    C540117 AMAZONFEE  AMAZON FEE       -1   1/5/2011 9:55  16888.02
## 43704    C540118 AMAZONFEE  AMAZON FEE       -1   1/5/2011 9:57  16453.71
## 15017    C537630 AMAZONFEE  AMAZON FEE       -1 12/7/2010 15:04  13541.33
## 15018     537632 AMAZONFEE  AMAZON FEE        1 12/7/2010 15:08  13541.33
##        CustomerID        Country Quantity_indicator Price_Class
## 222682      15098 United Kingdom                Low        High
## 524603         NA United Kingdom                Low        High
## 43703          NA United Kingdom                Low        High
## 43704          NA United Kingdom                Low        High
## 15017          NA United Kingdom                Low        High
## 15018          NA United Kingdom                Low        High

In the next post, we will learn about An Example of Sorting the Data.

0 responses on "103.2.5 Sorting of Data"

Leave a Message

Blog Posts

Hurry up!!!

"use coupon code for FLAT 30% discount"  datascientistoffer        ___________________________________      Subscribe to our youtube channel. Get access to video tutorials.                

Contact Us

Statinfer Software Solutions#647 2nd floor 1st Main, Indira Nagar 1st Stage, 100 feet road,Indranagar Bangalore,Karnataka, Pin code:-560038 Landmarks: Opp. Namma Metro Pillar 48.

Connect with us

linkin fn twitter g

How to become a Data Scientist.?

top