In previous post we saw Merge and Join
In the telecom customer complaints data, most of the complaints are generally related to the Services or the Bills. The data on Bill and Complaints will give us some relation between Bill payers and the Issues they are facing.
“./Telecom Data Analysis/Bill.csv”
“./Telecom Data Analysis/Complaints.csv”
1. Import the data and remove duplicates based on cust_id
Create a dataset for each of these requirements
2. All the customers who appear either in bill data or complaints data
3. All the customers who appear both in bill data and complaints data
4. All the customers from bill data: Customers who have bill data along with their complaints
5. All the customers from complaints data: Customers who have Complaints data along with their bill info
Solutions
>bills_data <- read.csv("C:\\Users\\venk\\Google Drive\\Training\\Datasets\\Telecom Data Analysis\\Bill.csv") >head(bills_data) >complaints_data <- read.csv("C:\\Users\\venk\\Google Drive\\Training\\Datasets\\Telecom Data Analysis\\Complaints.csv") >head(complaints_data)
## cust_id act_active_dt package_id bill_zip customer_segment status ## 1 9243148228 6/22/2006 54518 560095 S ACTIVE ## 2 9243264060 6/23/2006 617691 580029 S ACTIVE ## 3 8951061271 8/14/2010 616488 560037 F ACTIVE ## 4 8951033996 6/26/2010 616488 560062 R ACTIVE ## 5 9241079722 9/30/2005 614975 560017 G - TC ACTIVE ## 6 8472656385 12/22/2005 605662 585101 S ACTIVE ## bill_amt ## 1 561.80 ## 2 224.35 ## 3 1689.04 ## 4 344.04 ## 5 478.33 ## 6 199.08 ## comp_id month incident cust_id sla.status.new ## 1 318691791 12-Jul 7/22/2012 9243584301 ClosedIN SLA ## 2 318691795 12-Jul 7/22/2012 9243584297 ClosedIN SLA ## 3 318691807 12-Jul 7/22/2012 9243584279 ClosedIN SLA ## 4 318691809 12-Jul 7/22/2012 9243584277 ClosedIN SLA ## 5 317997895 12-Jul 7/17/2012 9282441943 ClosedIN SLA ## 6 318030735 12-Jul 7/17/2012 7204112312 ClosedIN SLA ## incident.type type severity ## 1 Customer Communication VAS-OTHERS High ## 2 Customer Communication VAS-OTHERS High ## 3 Customer Communication VAS-OTHERS High ## 4 Customer Communication VAS-OTHERS High ## 5 Barring/Unbarring VAS-OTHERS High ## 6 Name/Address Modification 1515 Low
- Import the data and remove duplicates based on cust_id
>Dupes_in_bills_cust_id <- duplicated(bills_data$cust_id) >Bills_unique<-bills_data[!Dupes_in_bills_cust_id,] >nrow(bills_data) >nrow(Bills_unique) >Dupes_in_Complaints_cust_id <- duplicated(complaints_data$cust_id) >Complaints_unique<-complaints_data[!Dupes_in_Complaints_cust_id,] >nrow(complaints_data) >nrow(Complaints_unique)
This will output the following
## [1] 9462 ## [1] 9389 ## [1] 6587 ## [1] 4856
Create a dataset for each of these requirements
- All the customers who appear either in bill data or complaints data
>all_customer<-merge(Bills_unique,Complaints_unique,by="cust_id",all=TRUE) >head(all_customer) >dim(all_customer) >dim(Bills_unique) >dim(Complaints_unique)
## cust_id act_active_dt package_id bill_zip customer_segment status ## 1 61438 <NA> NA NA <NA> <NA> ## 2 1051629 <NA> NA NA <NA> <NA> ## 3 1081086 <NA> NA NA <NA> <NA> ## 4 2666633 <NA> NA NA <NA> <NA> ## 5 4199777 <NA> NA NA <NA> <NA> ## 6 9260076 <NA> NA NA <NA> <NA> ## bill_amt comp_id month incident sla.status.new ## 1 NA 317161597 12-Jul 7/11/2012 ClosedIN SLA ## 2 NA 318488835 12-Jul 7/20/2012 OpenIN SLA ## 3 NA 316228241 12-Jul 7/4/2012 OpenIN SLA ## 4 NA 318230651 12-Jul 7/18/2012 ClosedIN SLA ## 5 NA 319023287 12-Jul 7/24/2012 ClosedIN SLA ## 6 NA 316164043 12-Jul 7/3/2012 ClosedOUT OF SLA ## incident.type type severity ## 1 Cancellation for Data products VAS-OTHERS High ## 2 Cancellation for Data products VAS-OTHERS High ## 3 Cancellation Request VAS-OTHERS High ## 4 Complaint on Churn NON TECH Low ## 5 Barring/Unbarring VAS-OTHERS High ## 6 Shift Premises CB VAS High
## [1] 13952 14 ## [1] 9389 7 ## [1] 4856 8
- All the customers who appear both in bill data and complaints data
>customer_in_both<-merge(Bills_unique,Complaints_unique,by="cust_id",all=FALSE) >head(customer_in_both) >dim(customer_in_both) >dim(Bills_unique) >dim(Complaints_unique)
## cust_id act_active_dt package_id bill_zip customer_segment status ## 1 1402001553 3/2/2012 620117 560022 G ACTIVE ## 2 1402001554 3/2/2012 620117 560022 G ACTIVE ## 3 1402001555 3/2/2012 620117 560022 G ACTIVE ## 4 1402001556 3/2/2012 620117 560022 G ACTIVE ## 5 1402001557 3/2/2012 620117 560022 G ACTIVE ## 6 9241182632 4/30/2011 618402 560001 N ACTIVE ## bill_amt comp_id month incident sla.status.new ## 1 0.00 318388855 12-Jul 7/20/2012 OpenOUT OF SLA ## 2 0.00 318388859 12-Jul 7/20/2012 OpenOUT OF SLA ## 3 0.00 318388857 12-Jul 7/20/2012 OpenOUT OF SLA ## 4 0.00 318388863 12-Jul 7/20/2012 OpenOUT OF SLA ## 5 0.00 318388861 12-Jul 7/20/2012 OpenOUT OF SLA ## 6 1067.42 318144735 12-Jul 7/18/2012 ClosedIN SLA ## incident.type type severity ## 1 Customer Communication VAS-OTHERS High ## 2 Customer Communication VAS-OTHERS High ## 3 Customer Communication VAS-OTHERS High ## 4 Customer Communication VAS-OTHERS High ## 5 Customer Communication VAS-OTHERS High ## 6 198 - Type Trouble Ticket TECH TC City A
## [1] 293 14 ## [1] 9389 7 ## [1] 4856 8
- All the customers from bill data: Customers who have bill data along with their complaints
>bill_customer<-merge(Bills_unique,Complaints_unique,by="cust_id",all.x=TRUE) >head(bill_customer) >dim(bill_customer) >dim(Bills_unique) >dim(bill_customer)
## cust_id act_active_dt package_id bill_zip customer_segment status ## 1 1008001429 3/27/2009 610487 560094 G - TC ACTIVE ## 2 1008001761 3/21/2009 610487 560029 G - TC ACTIVE ## 3 1008001783 3/21/2009 610487 560029 G - TC ACTIVE ## 4 1008001811 4/13/2009 610487 560094 G - TC ACTIVE ## 5 1008001924 4/4/2009 610487 560094 G - TC ACTIVE ## 6 1008002082 4/13/2009 610487 560094 G - TC ACTIVE ## bill_amt comp_id month incident sla.status.new incident.type type ## 1 11257.91 NA <NA> <NA> <NA> <NA> <NA> ## 2 12323.78 NA <NA> <NA> <NA> <NA> <NA> ## 3 11213.66 NA <NA> <NA> <NA> <NA> <NA> ## 4 14468.88 NA <NA> <NA> <NA> <NA> <NA> ## 5 10360.15 NA <NA> <NA> <NA> <NA> <NA> ## 6 9825.04 NA <NA> <NA> <NA> <NA> <NA> ## severity ## 1 <NA> ## 2 <NA> ## 3 <NA> ## 4 <NA> ## 5 <NA> ## 6 <NA>
## [1] 9389 14 ## [1] 9389 7 ## [1] 9389 14
- All the customers from complaints data: Customers who have Complaints data along with their bill info
>complaints_customer<-merge(Bills_unique,Complaints_unique,by="cust_id",all.y=TRUE) >head(complaints_customer) >dim(complaints_customer)
## cust_id act_active_dt package_id bill_zip customer_segment status ## 1 61438 <NA> NA NA <NA> <NA> ## 2 1051629 <NA> NA NA <NA> <NA> ## 3 1081086 <NA> NA NA <NA> <NA> ## 4 2666633 <NA> NA NA <NA> <NA> ## 5 4199777 <NA> NA NA <NA> <NA> ## 6 9260076 <NA> NA NA <NA> <NA> ## bill_amt comp_id month incident sla.status.new ## 1 NA 317161597 12-Jul 7/11/2012 ClosedIN SLA ## 2 NA 318488835 12-Jul 7/20/2012 OpenIN SLA ## 3 NA 316228241 12-Jul 7/4/2012 OpenIN SLA ## 4 NA 318230651 12-Jul 7/18/2012 ClosedIN SLA ## 5 NA 319023287 12-Jul 7/24/2012 ClosedIN SLA ## 6 NA 316164043 12-Jul 7/3/2012 ClosedOUT OF SLA ## incident.type type severity ## 1 Cancellation for Data products VAS-OTHERS High ## 2 Cancellation for Data products VAS-OTHERS High ## 3 Cancellation Request VAS-OTHERS High ## 4 Complaint on Churn NON TECH Low ## 5 Barring/Unbarring VAS-OTHERS High ## 6 Shift Premises CB VAS High
## [1] 4856 14 In the next post we will see Exporting the Data in R.