Consider the Sales data for a Superstore which has its departments established in different countries. This data set can be easily bifurcated on the basis of various countries. This data contains all the information related to the consumers purchase attributes. Using this data try to Solve the below mentioned questions.
1. Import “./Superstore Sales Data/Sales_by_country_v1.csv” data
2. Perform the basic checks on the data
3. How many rows and columns are there in this dataset?
4. Print only column names in the dataset
5. Print first 10 observations
6. Print the last 5 observations
7. Get the summary of the dataset
8. Print the structure of the data
9. Describe the field unitsSold, custCountry
10. Create a new dataset by taking first 30 observations from this data
11. Print the resultant data
12. Remove (delete) the new dataset
Solution:
1.Import "Superstore Sales Data\Sales_by_country_v1.csv" data
Sales_data <- read.csv("C:\\Users\\venk\\Google Drive\\Training\\Datasets\\Superstore Sales Data\\Sales_by_country_v1.csv") >head(Sales_data)
## custId custName custCountry productSold
## 1 23262 Candice Levy Congo SUPA101
## 2 23263 Xerxes Smith Panama DETA200
## 3 23264 Levi Douglas Tanzania, United Republic of DETA800
## 4 23265 Uriel Benton South Africa SUPA104
## 5 23266 Celeste Pugh Gabon PURA200
## 6 23267 Vance Campos Syrian Arab Republic PURA100
## salesChannel unitsSold dateSold
## 1 Retail 117 2012-08-09
## 2 Online 73 2012-07-06
## 3 Online 205 2012-08-18
## 4 Online 14 2012-08-05
## 5 Retail 170 2012-08-11
## 6 Retail 129 2012-07-11
2.Perform the basic checks on the data
>dim(Sales_data) ## [1] 998 7 >head(Sales_data)
## custId custName custCountry productSold ## 1 23262 Candice Levy Congo SUPA101 ## 2 23263 Xerxes Smith Panama DETA200 ## 3 23264 Levi Douglas Tanzania, United Republic of DETA800 ## 4 23265 Uriel Benton South Africa SUPA104 ## 5 23266 Celeste Pugh Gabon PURA200 ## 6 23267 Vance Campos Syrian Arab Republic PURA100 ## salesChannel unitsSold dateSold ## 1 Retail 117 2012-08-09 ## 2 Online 73 2012-07-06 ## 3 Online 205 2012-08-18 ## 4 Online 14 2012-08-05 ## 5 Retail 170 2012-08-11 ## 6 Retail 129 2012-07-11
>str(Sales_data)
## 'data.frame': 998 obs. of 7 variables: ## $ custId : int 23262 23263 23264 23265 23266 23267 23268 23269 23270 23271 ... ## $ custName : Factor w/ 998 levels "Aaron Edwards",..: 183 969 612 929 195 937 593 482 956 77 ... ## $ custCountry : Factor w/ 233 levels "Afghanistan",..: 49 160 204 191 74 201 83 122 112 169 ... ## $ productSold : Factor w/ 12 levels "DETA100","DETA200",..: 8 2 3 11 5 4 1 4 10 10 ... ## $ salesChannel: Factor w/ 3 levels "Direct","Online",..: 3 2 2 2 3 3 3 3 2 3 ... ## $ unitsSold : int 117 73 205 14 170 129 82 116 67 125 ... ## $ dateSold : Factor w/ 464 levels "2011-01-02","2011-01-03",..: 446 416 454 442 448 421 422 386 388 434 ... >tail(Sales_data)
## custId custName custCountry productSold salesChannel
## 993 24254 Anika Alford Belize DETA800 Online
## 994 24255 Ethan Day Tajikistan DETA100 Online
## 995 24256 Quail Knox Tonga PURA500 Retail
## 996 24257 Noelle Sargent Ireland DETA800 Direct
## 997 24258 Kuame Wallace Montserrat SUPA103 Online
## 998 24259 Lester Fisher Cocos (Keeling) Islands PURA500 Direct
## unitsSold dateSold
## 993 6 2011-07-08
## 994 189 2011-01-09
## 995 43 2011-05-08
## 996 17 2011-02-04
## 997 80 2011-01-13
## 998 138 2011-08-10
3.How many rows and columns are there in this dataset?
>dim(Sales_data)
## [1] 998 7
4.Print only column names in the dataset
>names(Sales_data)
## [1] "custId" "custName" "custCountry" "productSold"
## [5] "salesChannel" "unitsSold" "dateSold"
5.Print first 10 observations
>head(Sales_data, n=10) ## custId custName custCountry productSold ## 1 23262 Candice Levy Congo SUPA101 ## 2 23263 Xerxes Smith Panama DETA200 ## 3 23264 Levi Douglas Tanzania, United Republic of DETA800 ## 4 23265 Uriel Benton South Africa SUPA104 ## 5 23266 Celeste Pugh Gabon PURA200 ## 6 23267 Vance Campos Syrian Arab Republic PURA100 ## 7 23268 Latifah Wall Guadeloupe DETA100 ## 8 23269 Jane Hernandez Macedonia PURA100 ## 9 23270 Wanda Garza Kyrgyzstan SUPA103 ## 10 23271 Athena Fitzpatrick Reunion SUPA103 ## salesChannel unitsSold dateSold ## 1 Retail 117 2012-08-09 ## 2 Online 73 2012-07-06 ## 3 Online 205 2012-08-18 ## 4 Online 14 2012-08-05 ## 5 Retail 170 2012-08-11 ## 6 Retail 129 2012-07-11 ## 7 Retail 82 2012-07-12 ## 8 Retail 116 2012-06-03 ## 9 Online 67 2012-06-07 ## 10 Retail 125 2012-07-27
OR
>Sales_data[c(1:10),]
## custId custName custCountry productSold
## 1 23262 Candice Levy Congo SUPA101
## 2 23263 Xerxes Smith Panama DETA200
## 3 23264 Levi Douglas Tanzania, United Republic of DETA800
## 4 23265 Uriel Benton South Africa SUPA104
## 5 23266 Celeste Pugh Gabon PURA200
## 6 23267 Vance Campos Syrian Arab Republic PURA100
## 7 23268 Latifah Wall Guadeloupe DETA100
## 8 23269 Jane Hernandez Macedonia PURA100
## 9 23270 Wanda Garza Kyrgyzstan SUPA103
## 10 23271 Athena Fitzpatrick Reunion SUPA103
## salesChannel unitsSold dateSold
## 1 Retail 117 2012-08-09
## 2 Online 73 2012-07-06
## 3 Online 205 2012-08-18
## 4 Online 14 2012-08-05
## 5 Retail 170 2012-08-11
## 6 Retail 129 2012-07-11
## 7 Retail 82 2012-07-12
## 8 Retail 116 2012-06-03
## 9 Online 67 2012-06-07
## 10 Retail 125 2012-07-27
6.Print the last 5 observations
>tail(Sales_data, n=5)
## custId custName custCountry productSold salesChannel
## 994 24255 Ethan Day Tajikistan DETA100 Online
## 995 24256 Quail Knox Tonga PURA500 Retail
## 996 24257 Noelle Sargent Ireland DETA800 Direct
## 997 24258 Kuame Wallace Montserrat SUPA103 Online
## 998 24259 Lester Fisher Cocos (Keeling) Islands PURA500 Direct
## unitsSold dateSold
## 994 189 2011-01-09
## 995 43 2011-05-08
## 996 17 2011-02-04
## 997 80 2011-01-13
## 998 138 2011-08-10
7.Get the summary of the dataset
>summary(Sales_data)
## custId custName custCountry
## Min. :23262 Aaron Edwards : 1 Denmark : 10
## 1st Qu.:23511 Abigail Cunningham: 1 Swaziland : 10
## Median :23761 Abraham Mcguire : 1 Turkey : 10
## Mean :23761 Acton Mendoza : 1 Azerbaijan : 9
## 3rd Qu.:24010 Acton Ratliff : 1 Bouvet Island: 9
## Max. :24259 Adam Blackburn : 1 Nauru : 9
## (Other) :992 (Other) :941
## productSold salesChannel unitsSold dateSold
## PURA100:112 Direct: 91 Min. : 1.00 2011-11-11: 7
## SUPA103: 90 Online:511 1st Qu.: 52.25 2012-05-15: 7
## DETA800: 89 Retail:396 Median :111.00 2012-01-08: 6
## DETA100: 87 Mean :108.26 2012-02-20: 6
## SUPA102: 86 3rd Qu.:163.00 2012-04-06: 6
## PURA200: 84 Max. :212.00 2012-04-21: 6
## (Other):450 (Other) :960
8.Print the structure of the data
>str(Sales_data)
## 'data.frame': 998 obs. of 7 variables:
## $ custId : int 23262 23263 23264 23265 23266 23267 23268 23269 23270 23271 ...
## $ custName : Factor w/ 998 levels "Aaron Edwards",..: 183 969 612 929 195 937 593 482 956 77 ...
## $ custCountry : Factor w/ 233 levels "Afghanistan",..: 49 160 204 191 74 201 83 122 112 169 ...
## $ productSold : Factor w/ 12 levels "DETA100","DETA200",..: 8 2 3 11 5 4 1 4 10 10 ...
## $ salesChannel: Factor w/ 3 levels "Direct","Online",..: 3 2 2 2 3 3 3 3 2 3 ...
## $ unitsSold : int 117 73 205 14 170 129 82 116 67 125 ...
## $ dateSold : Factor w/ 464 levels "2011-01-02","2011-01-03",..: 446 416 454 442 448 421 422 386 388 434 ...
9.Describe the field unitsSold, custCountry
>str(Sales_data$unitsSold, Sales_data$custCountry)
## int [1:998] 117 73 205 14 170 129 82 116 67 125 ...
10.Create a new dataset by taking first 30 observations from this data
>Sales_data_new <- Sales_data[c(1:30),]
>dim(Sales_data_new)
## [1] 30 7
11.Print the resultant data
>Print(Sales_data_new)
This would print the data set in the console, making it easy for the user to interpret it.
12.Remove(delete) the new dataset
>rm(Sales_data_new)
Since the dataset is already removed, nothing would be displayed in the output. The other option to remove the variable is Sales_data_new <- NULL.
In the next post you will get to see a sub setting example.