Link to the previous post : https://statinfer.com/104-2-7-identifying-and-removing-duplicate-values-from-dataset-in-python/

In this post we will learn how to merge two datasets in python using pandas library.

Data sets merging and Joining

Datasets:
- TV Commercial Slots Analysis/orders.csv
- TV Commercial Slots Analysis/slots.csv

In [103]:

orders=pd.read_csv("datasets\\TV Commercial Slots Analysis\\orders.csv")
orders.shape

Out[103]:

(1369, 9)

In [105]:

slots=pd.read_csv("datasets\\TV Commercial Slots Analysis\\slots.csv")
slots.shape

Out[105]:

(1764, 17)

In [108]:

# duplicates based on Unique_id
sum(orders.Unique_id.duplicated())

Out[108]:

In [109]:

# duplicates based on Unique_id
sum(slots.Unique_id.duplicated())

Out[109]:

In [110]:

#Removing Duplicates
orders1=orders.drop_duplicates(['Unique_id'])
slots1=slots.drop_duplicates(['Unique_id'])

In [111]:

sum(orders1.Unique_id.duplicated())

Out[111]:

In [112]:

sum(slots1.Unique_id.duplicated())

Out[112]:

Inner Join

In [113]:

inner_data=pd.merge(orders1, slots1, on='Unique_id', how='inner')
inner_data.shape

Out[113]:

(8, 25)

Outer Join

In [114]:

outer_data=pd.merge(orders1, slots1, on='Unique_id', how='outer')
outer_data.shape

Out[114]:

(3109, 25)

Left outer Join

In [115]:

L_outer_data=pd.merge(orders1, slots1, on='Unique_id', how='left')
L_outer_data.shape

Out[115]:

(1366, 25)

Righ outer Join

In [116]:

R_outer_data=pd.merge(orders1, slots1, on='Unique_id', how='right')
R_outer_data.shape

Out[116]:

(1751, 25)

LAB: Data Joins

Datasets
- “./Telecom Data Analysis/Bill.csv”
- “./Telecom Data Analysis/Complaints.csv”
Import the data and remove duplicates based on cust_id
Create a dataset for each of these requirements
All the customers who appear either in bill data or complaints data
All the customers who appear both in bill data and complaints data
All the customers from bill data: Customers who have bill data along with their complaints
All the customers from complaints data: Customers who have Complaints data along with their bill info

In [117]:

comp_data=pd.read_csv("datasets\\Telecom Data Analysis\\Complaints.csv")
comp_data.shape

Out[117]:

(6587, 8)

In [118]:

bill_data=pd.read_csv("datasets\\Telecom Data Analysis\\Bill.csv")
bill_data.shape

Out[118]:

(9462, 7)

In [119]:

#Import the data and remove duplicates based on cust_id
comp_data1=comp_data.drop_duplicates(['cust_id'])
comp_data1.shape

Out[119]:

(4856, 8)

In [120]:

bill_data1=bill_data.drop_duplicates(['cust_id'])
bill_data1.shape

Out[120]:

(9389, 7)

In [121]:

#All the customers who appear either in bill data or complaints data 
combined1=pd.merge(comp_data1, bill_data1, on='cust_id', how='outer')
combined1.shape

Out[121]:

(13952, 14)

In [122]:

#All the customers who appear both in bill data and complaints data
combined2=pd.merge(comp_data1, bill_data1, on='cust_id', how='inner')
combined2.shape

Out[122]:

(293, 14)

In [123]:

#All the customers from bill data: Customers who have bill data along with their complaints
combined3=pd.merge(comp_data1, bill_data1, on='cust_id', how='right')
combined3.shape

Out[123]:

(9389, 14)

In [124]:

#All the customers from complaints data: Customers who have Complaints data along with their bill info
combined4=pd.merge(comp_data1, bill_data1, on='cust_id', how='left')
combined4.shape

Out[124]:

(4856, 14)

104.2.8 Joining and Merging datasets in Python

Using merge function of pandas library

Data sets merging and Joining

Inner Join

Outer Join

Left outer Join

Righ outer Join

LAB: Data Joins

Statinfer

Statinfer

Statinfer

104.2.8 Joining and Merging datasets in Python

Using merge function of pandas library

Data sets merging and Joining

Inner Join

Outer Join

Left outer Join

Righ outer Join

LAB: Data Joins

Related Courses

Python(Batch6)

Statinfer

Tableau (Batch6)

Statinfer

PowerBI (Batch6)

Statinfer