In a previous post, we saw Calculated Fields in R
Sorting of the data can be considered as the fundamental part of the Data Analysis. The user might want to sort the Names in the Alphabetical order, or wants to sort the Income data in the ascending order to find the highest taxpayers, etc. Sorting is helpful in managing the data in ascending or descending order. Like for example, A bank wants to find the highest taxpayers in a city. For this, they need to sort the data on the basis of the Income in the Descending order, and hence select the top taxpayers from the list. In R, there is an inbuilt function order() to sort the data in ascending or descending order. The syntax for sorting the data is
>Newdata <- Olddata[order(variables),]
Consider the following examples
- Sorting the data in Ascending order
>Online.Retail_sort<-Online_Retail[order(Online_Retail$UnitPrice),] >head(Online.Retail_sort)
## InvoiceNo StockCode Description Quantity InvoiceDate ## 299984 A563186 B Adjust bad debt 1 8/12/2011 14:51 ## 299985 A563187 B Adjust bad debt 1 8/12/2011 14:52 ## 623 536414 22139 56 12/1/2010 11:52 ## 1971 536545 21134 1 12/1/2010 14:32 ## 1972 536546 22145 1 12/1/2010 14:33 ## 1973 536547 37509 1 12/1/2010 14:33 ## UnitPrice CustomerID Country Quantity_indicator Price_Class ## 299984 -11062.06 NA United Kingdom Low Low ## 299985 -11062.06 NA United Kingdom Low Low ## 623 0.00 NA United Kingdom Low Low ## 1971 0.00 NA United Kingdom Low Low ## 1972 0.00 NA United Kingdom Low Low ## 1973 0.00 NA United Kingdom Low Low
The code above would sort the data in UnitPrice in ascending order and store the sorted data set into Online.Retail_sort as shown in the output. To sort the data in descending order, the negative sign is added in front of the variable which is to be sorted.
>Online.Retail_sort1<-Online_Retail[order(-Online_Retail$UnitPrice),] >head(Online.Retail_sort1)
## InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice ## 222682 C556445 M Manual -1 6/10/2011 15:31 38970.00 ## 524603 C580605 AMAZONFEE AMAZON FEE -1 12/5/2011 11:36 17836.46 ## 43703 C540117 AMAZONFEE AMAZON FEE -1 1/5/2011 9:55 16888.02 ## 43704 C540118 AMAZONFEE AMAZON FEE -1 1/5/2011 9:57 16453.71 ## 15017 C537630 AMAZONFEE AMAZON FEE -1 12/7/2010 15:04 13541.33 ## 15018 537632 AMAZONFEE AMAZON FEE 1 12/7/2010 15:08 13541.33 ## CustomerID Country Quantity_indicator Price_Class ## 222682 15098 United Kingdom Low High ## 524603 NA United Kingdom Low High ## 43703 NA United Kingdom Low High ## 43704 NA United Kingdom Low High ## 15017 NA United Kingdom Low High ## 15018 NA United Kingdom Low High
The code above would sort the data UnitPrice in descending order and store it is new variable Retail_sort1.
Sorting based on multiple variables
Sorting in the data set can also be done with two variables simultaneously.
As in the code below, the Country Name is sorted in ascending(alphabetical) order and within country, Quantity is sorted in descending(numeric) order and is being stored in Online.Retail_Sort2, the output of which can be seen below.
>Online.Retail_sort2<-Online_Retail[order(Online_Retail$Country, -Online_Retail$Quantity),] >head(Online.Retail_sort2)
## InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice ## 222682 C556445 M Manual -1 6/10/2011 15:31 38970.00 ## 524603 C580605 AMAZONFEE AMAZON FEE -1 12/5/2011 11:36 17836.46 ## 43703 C540117 AMAZONFEE AMAZON FEE -1 1/5/2011 9:55 16888.02 ## 43704 C540118 AMAZONFEE AMAZON FEE -1 1/5/2011 9:57 16453.71 ## 15017 C537630 AMAZONFEE AMAZON FEE -1 12/7/2010 15:04 13541.33 ## 15018 537632 AMAZONFEE AMAZON FEE 1 12/7/2010 15:08 13541.33 ## CustomerID Country Quantity_indicator Price_Class ## 222682 15098 United Kingdom Low High ## 524603 NA United Kingdom Low High ## 43703 NA United Kingdom Low High ## 43704 NA United Kingdom Low High ## 15017 NA United Kingdom Low High ## 15018 NA United Kingdom Low High
In the next post, we will learn about An Example of Sorting the Data.