• No products in the cart.

103.2.2.a Hands on Exercise – Basic Commands on data

Learn by practise.

Consider the Sales data for a Superstore which has its departments established in different countries. This data set can be easily bifurcated on the basis of various countries. This data contains all the information related to the consumers purchase attributes. Using this data try to Solve the below mentioned questions.

1. Import “./Superstore Sales  Data/Sales_by_country_v1.csv” data
2. Perform the basic checks on the data
3. How many rows and columns are there in this dataset?
4. Print only column names in the dataset
5. Print first 10 observations
6. Print the last 5 observations
7. Get the summary of the dataset
8. Print the structure of the data
9. Describe the field unitsSold, custCountry
10. Create a new dataset by taking first 30 observations from this data
11. Print the resultant data
12. Remove (delete) the new dataset

Solution:

1.Import "Superstore Sales Data\Sales_by_country_v1.csv" data
Sales_data <- read.csv("C:\\Users\\venk\\Google Drive\\Training\\Datasets\\Superstore Sales Data\\Sales_by_country_v1.csv")
>head(Sales_data)

##   custId     custName                  custCountry productSold
## 1  23262 Candice Levy                        Congo     SUPA101
## 2  23263 Xerxes Smith                       Panama     DETA200
## 3  23264 Levi Douglas Tanzania, United Republic of     DETA800
## 4  23265 Uriel Benton                 South Africa     SUPA104
## 5  23266 Celeste Pugh                        Gabon     PURA200
## 6  23267 Vance Campos         Syrian Arab Republic     PURA100
##   salesChannel unitsSold   dateSold
## 1       Retail       117 2012-08-09
## 2       Online        73 2012-07-06
## 3       Online       205 2012-08-18
## 4       Online        14 2012-08-05
## 5       Retail       170 2012-08-11
## 6       Retail       129 2012-07-11

2.Perform the basic checks on the data

>dim(Sales_data)
## [1] 998   7
>head(Sales_data)
##   custId     custName                  custCountry productSold
 ## 1  23262 Candice Levy                        Congo     SUPA101
 ## 2  23263 Xerxes Smith                       Panama     DETA200
 ## 3  23264 Levi Douglas Tanzania, United Republic of     DETA800
 ## 4  23265 Uriel Benton                 South Africa     SUPA104
 ## 5  23266 Celeste Pugh                        Gabon     PURA200
 ## 6  23267 Vance Campos         Syrian Arab Republic     PURA100
 ##   salesChannel unitsSold   dateSold
 ## 1       Retail       117 2012-08-09
 ## 2       Online        73 2012-07-06
 ## 3       Online       205 2012-08-18
 ## 4       Online        14 2012-08-05
 ## 5       Retail       170 2012-08-11
 ## 6       Retail       129 2012-07-11
>str(Sales_data)
## 'data.frame':    998 obs. of  7 variables:
 ##  $ custId      : int  23262 23263 23264 23265 23266 23267 23268 23269 23270 23271 ...
 ##  $ custName    : Factor w/ 998 levels "Aaron Edwards",..: 183 969 612 929 195 937 593 482 956 77 ...
 ##  $ custCountry : Factor w/ 233 levels "Afghanistan",..: 49 160 204 191 74 201 83 122 112 169 ...
 ##  $ productSold : Factor w/ 12 levels "DETA100","DETA200",..: 8 2 3 11 5 4 1 4 10 10 ...
 ##  $ salesChannel: Factor w/ 3 levels "Direct","Online",..: 3 2 2 2 3 3 3 3 2 3 ...
 ##  $ unitsSold   : int  117 73 205 14 170 129 82 116 67 125 ...
 ##  $ dateSold    : Factor w/ 464 levels "2011-01-02","2011-01-03",..: 446 416 454 442 448 421 422 386 388 434 ...

>tail(Sales_data)

##     custId       custName             custCountry productSold salesChannel
 ## 993  24254   Anika Alford                  Belize     DETA800       Online
 ## 994  24255      Ethan Day              Tajikistan     DETA100       Online
 ## 995  24256     Quail Knox                   Tonga     PURA500       Retail
 ## 996  24257 Noelle Sargent                 Ireland     DETA800       Direct
 ## 997  24258  Kuame Wallace              Montserrat     SUPA103       Online
 ## 998  24259  Lester Fisher Cocos (Keeling) Islands     PURA500       Direct
 ##     unitsSold   dateSold
 ## 993         6 2011-07-08
 ## 994       189 2011-01-09
 ## 995        43 2011-05-08
 ## 996        17 2011-02-04
 ## 997        80 2011-01-13
 ## 998       138 2011-08-10

3.How many rows and columns are there in this dataset?
>dim(Sales_data)
## [1] 998   7

4.Print only column names in the dataset
>names(Sales_data)
## [1] "custId"       "custName"     "custCountry"  "productSold" 
## [5] "salesChannel" "unitsSold"    "dateSold"

5.Print first 10 observations
>head(Sales_data, n=10)

##    custId           custName                  custCountry productSold
## 1   23262       Candice Levy                        Congo     SUPA101
## 2   23263       Xerxes Smith                       Panama     DETA200
## 3   23264       Levi Douglas Tanzania, United Republic of     DETA800
## 4   23265       Uriel Benton                 South Africa     SUPA104
## 5   23266       Celeste Pugh                        Gabon     PURA200
## 6   23267       Vance Campos         Syrian Arab Republic     PURA100
## 7   23268       Latifah Wall                   Guadeloupe     DETA100
## 8   23269     Jane Hernandez                    Macedonia     PURA100
## 9   23270        Wanda Garza                   Kyrgyzstan     SUPA103
## 10  23271 Athena Fitzpatrick                      Reunion     SUPA103
##    salesChannel unitsSold   dateSold
## 1        Retail       117 2012-08-09
## 2        Online        73 2012-07-06
## 3        Online       205 2012-08-18
## 4        Online        14 2012-08-05
## 5        Retail       170 2012-08-11
## 6        Retail       129 2012-07-11
## 7        Retail        82 2012-07-12
## 8        Retail       116 2012-06-03
## 9        Online        67 2012-06-07
## 10       Retail       125 2012-07-27

OR

>Sales_data[c(1:10),]

##    custId           custName                  custCountry productSold
## 1   23262       Candice Levy                        Congo     SUPA101
## 2   23263       Xerxes Smith                       Panama     DETA200
## 3   23264       Levi Douglas Tanzania, United Republic of     DETA800
## 4   23265       Uriel Benton                 South Africa     SUPA104
## 5   23266       Celeste Pugh                        Gabon     PURA200
## 6   23267       Vance Campos         Syrian Arab Republic     PURA100
## 7   23268       Latifah Wall                   Guadeloupe     DETA100
## 8   23269     Jane Hernandez                    Macedonia     PURA100
## 9   23270        Wanda Garza                   Kyrgyzstan     SUPA103
## 10  23271 Athena Fitzpatrick                      Reunion     SUPA103
##    salesChannel unitsSold   dateSold
## 1        Retail       117 2012-08-09
## 2        Online        73 2012-07-06
## 3        Online       205 2012-08-18
## 4        Online        14 2012-08-05
## 5        Retail       170 2012-08-11
## 6        Retail       129 2012-07-11
## 7        Retail        82 2012-07-12
## 8        Retail       116 2012-06-03
## 9        Online        67 2012-06-07
## 10       Retail       125 2012-07-27

6.Print the last 5 observations
>tail(Sales_data, n=5)

##     custId       custName             custCountry productSold salesChannel
## 994  24255      Ethan Day              Tajikistan     DETA100       Online
## 995  24256     Quail Knox                   Tonga     PURA500       Retail
## 996  24257 Noelle Sargent                 Ireland     DETA800       Direct
## 997  24258  Kuame Wallace              Montserrat     SUPA103       Online
## 998  24259  Lester Fisher Cocos (Keeling) Islands     PURA500       Direct
##     unitsSold   dateSold
## 994       189 2011-01-09
## 995        43 2011-05-08
## 996        17 2011-02-04
## 997        80 2011-01-13
## 998       138 2011-08-10

7.Get the summary of the dataset

>summary(Sales_data)

##      custId                    custName          custCountry
##  Min.   :23262   Aaron Edwards     :  1   Denmark      : 10
##  1st Qu.:23511   Abigail Cunningham:  1   Swaziland    : 10
##  Median :23761   Abraham Mcguire   :  1   Turkey       : 10
##  Mean   :23761   Acton Mendoza     :  1   Azerbaijan   :  9
##  3rd Qu.:24010   Acton Ratliff     :  1   Bouvet Island:  9
##  Max.   :24259   Adam Blackburn    :  1   Nauru        :  9
##                  (Other)           :992   (Other)      :941
##   productSold  salesChannel   unitsSold            dateSold
##  PURA100:112   Direct: 91   Min.   :  1.00   2011-11-11:  7
##  SUPA103: 90   Online:511   1st Qu.: 52.25   2012-05-15:  7
##  DETA800: 89   Retail:396   Median :111.00   2012-01-08:  6
##  DETA100: 87                Mean   :108.26   2012-02-20:  6
##  SUPA102: 86                3rd Qu.:163.00   2012-04-06:  6
##  PURA200: 84                Max.   :212.00   2012-04-21:  6
##  (Other):450                                 (Other)   :960

8.Print the structure of the data

>str(Sales_data)

## 'data.frame':    998 obs. of  7 variables:
##  $ custId      : int  23262 23263 23264 23265 23266 23267 23268 23269 23270 23271 ...
##  $ custName    : Factor w/ 998 levels "Aaron Edwards",..: 183 969 612 929 195 937 593 482 956 77 ...
##  $ custCountry : Factor w/ 233 levels "Afghanistan",..: 49 160 204 191 74 201 83 122 112 169 ...
##  $ productSold : Factor w/ 12 levels "DETA100","DETA200",..: 8 2 3 11 5 4 1 4 10 10 ...
##  $ salesChannel: Factor w/ 3 levels "Direct","Online",..: 3 2 2 2 3 3 3 3 2 3 ...
##  $ unitsSold   : int  117 73 205 14 170 129 82 116 67 125 ...
##  $ dateSold    : Factor w/ 464 levels "2011-01-02","2011-01-03",..: 446 416 454 442 448 421 422 386 388 434 ...

9.Describe the field unitsSold, custCountry
>str(Sales_data$unitsSold, Sales_data$custCountry)
##  int [1:998] 117 73 205 14 170 129 82 116 67 125 ...

10.Create a new dataset by taking first 30 observations from this data
>Sales_data_new <- Sales_data[c(1:30),]
>dim(Sales_data_new)
## [1] 30  7

11.Print the resultant data
>Print(Sales_data_new)

This would print the data set in the console, making it easy for the user to interpret it.

12.Remove(delete) the new dataset
>rm(Sales_data_new)

Since the dataset is already removed, nothing would be displayed in the output. The other option to remove the variable is Sales_data_new <- NULL.

In the next post you will get to see a sub setting example.

20th June 2017

Statinfer

Statinfer derived from Statistical inference. We provide training in various Data Analytics and Data Science courses and assist candidates in securing placements.

Contact Us

info@statinfer.com

+91- 9676098897

+91- 9494762485

 

Our Social Links

top
© 2020. All Rights Reserved.