In previous post we saw An Example of Handling Duplicates
In R, we have merge() function to join the data sets into one. Syntax for merge function is:
x <- merge(datasets1, dataset2, by="primary_key")
The function above merges the dataset1 and dataset2, and saves the merged data into the variable x. We need to mention the type of join otherwise the function does inner join by default.
With a single primary key, the syntax is
Newdata <- merge(dataone, datatwo, by = "primary_key")With composite keys or more than one key
With composite keys or more than one key we can join two datasets. The syntax is:
Newdata <- merge(dataone, datatwo ,by=c("primary_key1","primary_key2")
- Joining the data can happen in one of the four ways:
- Inner Join
- Outer Join
- Left Outer Join
- Right Outer JoinThe value given for the parameter “all” or a “all.x” will decide the type of join.
Inner Join
Newdata <- merge(dataone, datatwo ,by="primary_key", all=FALSE)
Outer Join
Newdata <- merge(dataone, datatwo ,by="primary_key", all=TRUE)
Left Outer Join
Newdata <- merge(dataone, datatwo ,by="primary_key", all.x=TRUE)
Right Outer Join
Newdata <- merge(dataone, datatwo ,by="primary_key", all.y=TRUE)
Below is an example:
>orders <- read.csv("C:\\Users\\venk\\Google Drive\\Training\\Datasets\\TV Commercial Slots Analysis\\orders.csv") >slots <- read.csv("C:\\Users\\venk\\Google Drive\\Training\\Datasets\\TV Commercial Slots Analysis\\slots.csv")
Inner Join
Merge by using single key “Unique_id”
>Order_Slot_Common <- merge(orders, slots, by="Unique_id", all=FALSE) >head(Order_Slot_Common)
## Unique_id AD_ID.x Date Time ## 1 SPYMYA2CBSH416510.833333333333333 SPYMYA2CBSH 1/12/2014 8:00 PM ## 2 SPYMYA60C008416520.375 SPYMYA60C008 1/13/2014 9:00 AM ## 3 SPYMYA60C008416540.375 SPYMYA60C008 1/15/2014 9:00 AM ## 4 SPYMYA60C008416550.458333333333333 SPYMYA60C008 1/16/2014 11:00 AM ## 5 SPYMYA60C008416560.375 SPYMYA60C008 1/17/2014 9:00 AM ## 6 SPYMYA60C008416570.666666666666667 SPYMYA60C008 1/18/2014 4:00 PM ## Client.Product.Code.x Product.ID.x Network.ID.x Length.x Orders ## 1 MSDR02 SPYMYA CBS 120 3 ## 2 MSDR02 SPYMYA TBKIDS 60 1 ## 3 MSDR02 SPYMYA TBKIDS 60 1 ## 4 MSDR02 SPYMYA TBKIDS 60 1 ## 5 MSDR02 SPYMYA TBKIDS 60 1 ## 6 MSDR02 SPYMYA TBKIDS 60 1 ## AD_ID.y Air.Date Air.Time Time.Zone Client.Product.Code.y ## 1 SPYMYA2CBSH 1/12/2014 8:00 PM EST MSDR02 ## 2 SPYMYA60C008 1/13/2014 9:00 AM EST MSDR02 ## 3 SPYMYA60C008 1/15/2014 9:00 AM EST MSDR02 ## 4 SPYMYA60C008 1/16/2014 11:00 AM EST MSDR02 ## 5 SPYMYA60C008 1/17/2014 9:00 AM EST MSDR02 ## 6 SPYMYA60C008 1/18/2014 4:00 PM EST MSDR02 ## Product.ID.y Length.y Network.ID.y Network.Name Network.Type ## 1 SPYMYA 120 CBS CBS Broadcast Station ## 2 SPYMYA 60 TBKIDS The Band Kids Cable ## 3 SPYMYA 60 TBKIDS The Band Kids Cable ## 4 SPYMYA 60 TBKIDS The Band Kids Cable ## 5 SPYMYA 60 TBKIDS The Band Kids Cable ## 6 SPYMYA 60 TBKIDS The Band Kids Cable ## Network.DMA Network.Feed HH.Impressions HH.Ratings Spot.Cost ## 1 National 18418 15.91 4000 ## 2 Regional NA NA 1000 ## 3 Regional NA NA 1000 ## 4 Regional NA NA 1000 ## 5 Regional NA NA 1050 ## 6 Regional NA NA 1000 ## Spot.Cost.Discounted ## 1 3800.0 ## 2 950.0 ## 3 950.0 ## 4 950.0 ## 5 997.5 ## 6 950.0
Left Outer Join
Merge by using single key “Unique_id”
>Order_all_Slot_match<-merge(orders, slots, by="Unique_id", all.x=TRUE) >head(Order_all_Slot_match)
OUTPUT:
## Unique_id AD_ID.x Date Time ## 1 MYSPYR60A002416540.375 MYSPYR60A002 1/15/2014 9:00 AM ## 2 MYSPYR60A002416620.708333333333333 MYSPYR60A002 1/23/2014 5:00 PM ## 3 MYSPYR60A002416630.416666666666667 MYSPYR60A002 1/24/2014 10:00 AM ## 4 MYSPYR60A002416640.625 MYSPYR60A002 1/25/2014 3:00 PM ## 5 MYSPYR60A004416500.666666666666667 MYSPYR60A004 1/11/2014 4:00 PM ## 6 MYSPYR60A004416520.916666666666667 MYSPYR60A004 1/13/2014 10:00 PM ## Client.Product.Code.x Product.ID.x Network.ID.x Length.x Orders AD_ID.y ## 1 MSDR02R MYSPYR ESQ 60 1 <NA> ## 2 MSDR02R MYSPYR ESQ 60 1 <NA> ## 3 MSDR02R MYSPYR ESQ 60 1 <NA> ## 4 MSDR02R MYSPYR ESQ 60 1 <NA> ## 5 MSDR02R MYSPYR NHLN 60 1 <NA> ## 6 MSDR02R MYSPYR NHLN 60 1 <NA> ## Air.Date Air.Time Time.Zone Client.Product.Code.y Product.ID.y Length.y ## 1 <NA> <NA> <NA> <NA> <NA> NA ## 2 <NA> <NA> <NA> <NA> <NA> NA ## 3 <NA> <NA> <NA> <NA> <NA> NA ## 4 <NA> <NA> <NA> <NA> <NA> NA ## 5 <NA> <NA> <NA> <NA> <NA> NA ## 6 <NA> <NA> <NA> <NA> <NA> NA ## Network.ID.y Network.Name Network.Type Network.DMA Network.Feed ## 1 <NA> <NA> <NA> <NA> <NA> ## 2 <NA> <NA> <NA> <NA> <NA> ## 3 <NA> <NA> <NA> <NA> <NA> ## 4 <NA> <NA> <NA> <NA> <NA> ## 5 <NA> <NA> <NA> <NA> <NA> ## 6 <NA> <NA> <NA> <NA> <NA> ## HH.Impressions HH.Ratings Spot.Cost Spot.Cost.Discounted ## 1 NA NA NA NA ## 2 NA NA NA NA ## 3 NA NA NA NA ## 4 NA NA NA NA ## 5 NA NA NA NA ## 6 NA NA NA NA
Right Outer Join
Merge by using single key “Unique_id”
>Slot_all_order_match<-merge(orders, slots, by=”Unique_id”, all.y=TRUE)
>head(Slot_all_order_match)
## Unique_id AD_ID.x Date Time ## 1 SPYMYA2CBSH416510.833333333333333 SPYMYA2CBSH 1/12/2014 8:00 PM ## 2 SPYMYA60C008416520.375 SPYMYA60C008 1/13/2014 9:00 AM ## 3 SPYMYA60C008416540.375 SPYMYA60C008 1/15/2014 9:00 AM ## 4 SPYMYA60C008416550.458333333333333 SPYMYA60C008 1/16/2014 11:00 AM ## 5 SPYMYA60C008416560.375 SPYMYA60C008 1/17/2014 9:00 AM ## 6 SPYMYA60C008416570.666666666666667 SPYMYA60C008 1/18/2014 4:00 PM ## Client.Product.Code.x Product.ID.x Network.ID.x Length.x Orders ## 1 MSDR02 SPYMYA CBS 120 3 ## 2 MSDR02 SPYMYA TBKIDS 60 1 ## 3 MSDR02 SPYMYA TBKIDS 60 1 ## 4 MSDR02 SPYMYA TBKIDS 60 1 ## 5 MSDR02 SPYMYA TBKIDS 60 1 ## 6 MSDR02 SPYMYA TBKIDS 60 1 ## AD_ID.y Air.Date Air.Time Time.Zone Client.Product.Code.y ## 1 SPYMYA2CBSH 1/12/2014 8:00 PM EST MSDR02 ## 2 SPYMYA60C008 1/13/2014 9:00 AM EST MSDR02 ## 3 SPYMYA60C008 1/15/2014 9:00 AM EST MSDR02 ## 4 SPYMYA60C008 1/16/2014 11:00 AM EST MSDR02 ## 5 SPYMYA60C008 1/17/2014 9:00 AM EST MSDR02 ## 6 SPYMYA60C008 1/18/2014 4:00 PM EST MSDR02 ## Product.ID.y Length.y Network.ID.y Network.Name Network.Type ## 1 SPYMYA 120 CBS CBS Broadcast Station ## 2 SPYMYA 60 TBKIDS The Band Kids Cable ## 3 SPYMYA 60 TBKIDS The Band Kids Cable ## 4 SPYMYA 60 TBKIDS The Band Kids Cable ## 5 SPYMYA 60 TBKIDS The Band Kids Cable ## 6 SPYMYA 60 TBKIDS The Band Kids Cable ## Network.DMA Network.Feed HH.Impressions HH.Ratings Spot.Cost ## 1 National 18418 15.91 4000 ## 2 Regional NA NA 1000 ## 3 Regional NA NA 1000 ## 4 Regional NA NA 1000 ## 5 Regional NA NA 1050 ## 6 Regional NA NA 1000 ## Spot.Cost.Discounted ## 1 3800.0 ## 2 950.0 ## 3 950.0 ## 4 950.0 ## 5 997.5 ## 6 950.0
Outer Join
Merge by using single key “Unique_id”
>Order_Slots_union<- merge(orders, slots, by="Unique_id", all=TRUE) >head(Order_Slots_union)
Order_Slots_union<- merge(orders, slots, by="Unique_id", all=TRUE) head(Order_Slots_union)
In the next post we will see An Example of Merge and Join.