Statinfer

103.2.7.a Merge and Join

Learn by Examples

In previous post we saw  Merge and Join

In the telecom customer complaints data, most of the complaints are generally related to the Services or the Bills. The data on Bill and Complaints will give us some relation between Bill payers and the Issues they are facing.

“./Telecom Data Analysis/Bill.csv”
“./Telecom Data Analysis/Complaints.csv”
1. Import the data and remove duplicates based on cust_id
Create a dataset for each of these requirements
2. All the customers who appear either in bill data or complaints data
3. All the customers who appear both in bill data and complaints data
4. All the customers from bill data: Customers who have bill data along with their complaints
5. All the customers from complaints data: Customers who have Complaints data along with their bill info

Solutions

>bills_data <- read.csv("C:\\Users\\venk\\Google Drive\\Training\\Datasets\\Telecom Data Analysis\\Bill.csv")
>head(bills_data)
>complaints_data <- read.csv("C:\\Users\\venk\\Google Drive\\Training\\Datasets\\Telecom Data Analysis\\Complaints.csv")
>head(complaints_data)
##      cust_id act_active_dt package_id bill_zip customer_segment status
## 1 9243148228     6/22/2006      54518   560095                S ACTIVE
## 2 9243264060     6/23/2006     617691   580029                S ACTIVE
## 3 8951061271     8/14/2010     616488   560037                F ACTIVE
## 4 8951033996     6/26/2010     616488   560062                R ACTIVE
## 5 9241079722     9/30/2005     614975   560017           G - TC ACTIVE
## 6 8472656385    12/22/2005     605662   585101                S ACTIVE
##   bill_amt
## 1   561.80
## 2   224.35
## 3  1689.04
## 4   344.04
## 5   478.33
## 6   199.08

##     comp_id  month  incident    cust_id sla.status.new
## 1 318691791 12-Jul 7/22/2012 9243584301   ClosedIN SLA
## 2 318691795 12-Jul 7/22/2012 9243584297   ClosedIN SLA
## 3 318691807 12-Jul 7/22/2012 9243584279   ClosedIN SLA
## 4 318691809 12-Jul 7/22/2012 9243584277   ClosedIN SLA
## 5 317997895 12-Jul 7/17/2012 9282441943   ClosedIN SLA
## 6 318030735 12-Jul 7/17/2012 7204112312   ClosedIN SLA
##               incident.type       type severity
## 1    Customer Communication VAS-OTHERS     High
## 2    Customer Communication VAS-OTHERS     High
## 3    Customer Communication VAS-OTHERS     High
## 4    Customer Communication VAS-OTHERS     High
## 5         Barring/Unbarring VAS-OTHERS     High
## 6 Name/Address Modification       1515      Low
  1. Import the data and remove duplicates based on cust_id
>Dupes_in_bills_cust_id <- duplicated(bills_data$cust_id)
>Bills_unique<-bills_data[!Dupes_in_bills_cust_id,]
>nrow(bills_data)
>nrow(Bills_unique)
>Dupes_in_Complaints_cust_id <- duplicated(complaints_data$cust_id)
>Complaints_unique<-complaints_data[!Dupes_in_Complaints_cust_id,]
>nrow(complaints_data)
>nrow(Complaints_unique)

This will output the following

## [1] 9462
## [1] 9389
## [1] 6587
## [1] 4856

Create a dataset for each of these requirements

  1. All the customers who appear either in bill data or complaints data
>all_customer<-merge(Bills_unique,Complaints_unique,by="cust_id",all=TRUE)
>head(all_customer)
>dim(all_customer)
>dim(Bills_unique)
>dim(Complaints_unique)
##   cust_id act_active_dt package_id bill_zip customer_segment status
## 1   61438          <NA>         NA       NA             <NA>   <NA>
## 2 1051629          <NA>         NA       NA             <NA>   <NA>
## 3 1081086          <NA>         NA       NA             <NA>   <NA>
## 4 2666633          <NA>         NA       NA             <NA>   <NA>
## 5 4199777          <NA>         NA       NA             <NA>   <NA>
## 6 9260076          <NA>         NA       NA             <NA>   <NA>
##   bill_amt   comp_id  month  incident   sla.status.new
## 1       NA 317161597 12-Jul 7/11/2012     ClosedIN SLA
## 2       NA 318488835 12-Jul 7/20/2012       OpenIN SLA
## 3       NA 316228241 12-Jul  7/4/2012       OpenIN SLA
## 4       NA 318230651 12-Jul 7/18/2012     ClosedIN SLA
## 5       NA 319023287 12-Jul 7/24/2012     ClosedIN SLA
## 6       NA 316164043 12-Jul  7/3/2012 ClosedOUT OF SLA
##                    incident.type       type severity
## 1 Cancellation for Data products VAS-OTHERS     High
## 2 Cancellation for Data products VAS-OTHERS     High
## 3           Cancellation Request VAS-OTHERS     High
## 4             Complaint on Churn   NON TECH      Low
## 5              Barring/Unbarring VAS-OTHERS     High
## 6                 Shift Premises     CB VAS     High
## [1] 13952    14
## [1] 9389    7
## [1] 4856    8
  1. All the customers who appear both in bill data and complaints data
>customer_in_both<-merge(Bills_unique,Complaints_unique,by="cust_id",all=FALSE)
>head(customer_in_both)
>dim(customer_in_both)
>dim(Bills_unique)
>dim(Complaints_unique)

 

##      cust_id act_active_dt package_id bill_zip customer_segment status
## 1 1402001553      3/2/2012     620117   560022                G ACTIVE
## 2 1402001554      3/2/2012     620117   560022                G ACTIVE
## 3 1402001555      3/2/2012     620117   560022                G ACTIVE
## 4 1402001556      3/2/2012     620117   560022                G ACTIVE
## 5 1402001557      3/2/2012     620117   560022                G ACTIVE
## 6 9241182632     4/30/2011     618402   560001                N ACTIVE
##   bill_amt   comp_id  month  incident sla.status.new
## 1     0.00 318388855 12-Jul 7/20/2012 OpenOUT OF SLA
## 2     0.00 318388859 12-Jul 7/20/2012 OpenOUT OF SLA
## 3     0.00 318388857 12-Jul 7/20/2012 OpenOUT OF SLA
## 4     0.00 318388863 12-Jul 7/20/2012 OpenOUT OF SLA
## 5     0.00 318388861 12-Jul 7/20/2012 OpenOUT OF SLA
## 6  1067.42 318144735 12-Jul 7/18/2012   ClosedIN SLA
##               incident.type       type  severity
## 1    Customer Communication VAS-OTHERS      High
## 2    Customer Communication VAS-OTHERS      High
## 3    Customer Communication VAS-OTHERS      High
## 4    Customer Communication VAS-OTHERS      High
## 5    Customer Communication VAS-OTHERS      High
## 6 198 - Type Trouble Ticket       TECH TC City A
## [1] 293  14
## [1] 9389    7
## [1] 4856    8
  1. All the customers from bill data: Customers who have bill data along with their complaints
>bill_customer<-merge(Bills_unique,Complaints_unique,by="cust_id",all.x=TRUE)
>head(bill_customer)
>dim(bill_customer)
>dim(Bills_unique)
>dim(bill_customer)
##      cust_id act_active_dt package_id bill_zip customer_segment status
## 1 1008001429     3/27/2009     610487   560094           G - TC ACTIVE
## 2 1008001761     3/21/2009     610487   560029           G - TC ACTIVE
## 3 1008001783     3/21/2009     610487   560029           G - TC ACTIVE
## 4 1008001811     4/13/2009     610487   560094           G - TC ACTIVE
## 5 1008001924      4/4/2009     610487   560094           G - TC ACTIVE
## 6 1008002082     4/13/2009     610487   560094           G - TC ACTIVE
##   bill_amt comp_id month incident sla.status.new incident.type type
## 1 11257.91      NA  <NA>     <NA>           <NA>          <NA> <NA>
## 2 12323.78      NA  <NA>     <NA>           <NA>          <NA> <NA>
## 3 11213.66      NA  <NA>     <NA>           <NA>          <NA> <NA>
## 4 14468.88      NA  <NA>     <NA>           <NA>          <NA> <NA>
## 5 10360.15      NA  <NA>     <NA>           <NA>          <NA> <NA>
## 6  9825.04      NA  <NA>     <NA>           <NA>          <NA> <NA>
##   severity
## 1     <NA>
## 2     <NA>
## 3     <NA>
## 4     <NA>
## 5     <NA>
## 6     <NA>
## [1] 9389   14
## [1] 9389    7
## [1] 9389   14
  1. All the customers from complaints data: Customers who have Complaints data along with their bill info
>complaints_customer<-merge(Bills_unique,Complaints_unique,by="cust_id",all.y=TRUE)
>head(complaints_customer)
>dim(complaints_customer)

 

##   cust_id act_active_dt package_id bill_zip customer_segment status
## 1   61438          <NA>         NA       NA             <NA>   <NA>
## 2 1051629          <NA>         NA       NA             <NA>   <NA>
## 3 1081086          <NA>         NA       NA             <NA>   <NA>
## 4 2666633          <NA>         NA       NA             <NA>   <NA>
## 5 4199777          <NA>         NA       NA             <NA>   <NA>
## 6 9260076          <NA>         NA       NA             <NA>   <NA>
##   bill_amt   comp_id  month  incident   sla.status.new
## 1       NA 317161597 12-Jul 7/11/2012     ClosedIN SLA
## 2       NA 318488835 12-Jul 7/20/2012       OpenIN SLA
## 3       NA 316228241 12-Jul  7/4/2012       OpenIN SLA
## 4       NA 318230651 12-Jul 7/18/2012     ClosedIN SLA
## 5       NA 319023287 12-Jul 7/24/2012     ClosedIN SLA
## 6       NA 316164043 12-Jul  7/3/2012 ClosedOUT OF SLA
##                    incident.type       type severity
## 1 Cancellation for Data products VAS-OTHERS     High
## 2 Cancellation for Data products VAS-OTHERS     High
## 3           Cancellation Request VAS-OTHERS     High
## 4             Complaint on Churn   NON TECH      Low
## 5              Barring/Unbarring VAS-OTHERS     High
## 6                 Shift Premises     CB VAS     High
## [1] 4856   14

In the next post we will see   Exporting the Data in R.

23rd January 2017

0 responses on "103.2.7.a Merge and Join"

Leave a Message

Blog Posts

Hurry up!!!

"use coupon code for FLAT 30% discount"  datascientistoffer        ___________________________________      Subscribe to our youtube channel. Get access to video tutorials.                

Contact Us

Statinfer Software Solutions#647 2nd floor 1st Main, Indira Nagar 1st Stage, 100 feet road,Indranagar Bangalore,Karnataka, Pin code:-560038 Landmarks: Opp. Namma Metro Pillar 48.

Connect with us

linkin fn twitter g

How to become a Data Scientist.?

top