Joins
- Two Tables
Retail_invoice = LOAD '/Retail_invoice_hdfs' USING PigStorage('\t') as (uniq_idi:chararray, InvoiceNo:chararray, StockCode:chararray, Description:chararray,Quantity:INT);
DESCRIBE Retail_invoice;
Retail_Customer = LOAD '/Retail_Customer_hdfs' USING PigStorage('\t') as (uniq_idc:chararray, InvoiceDate:chararray, UnitPrice:INT, CustomerID:chararray,Country:chararray);
DESCRIBE Retail_Customer;
Left Outer Join
Left_join = JOIN Retail_invoice BY uniq_idi LEFT OUTER, Retail_Customer BY uniq_idc;
DESCRIBE Left_join;
DUMP Left_join;
Right Outer Join
Right_join = JOIN Retail_invoice BY uniq_idi RIGHT OUTER, Retail_Customer BY uniq_idc;
DESCRIBE Right_join;
DUMP Right_join;
Full Outer Join
Full_join = JOIN Retail_invoice BY uniq_idi FULL, Retail_Customer BY uniq_idc;
DESCRIBE Full_join;
DUMP Full_join;
Inner Join
Inner_join = JOIN Retail_invoice BY uniq_idi , Retail_Customer BY uniq_idc;
DESCRIBE Inner_join;
DUMP Inner_join;
Storing the Results on Pig
- Similar to exporting the analysis resultant table out of Pig
- After the final analysis in pig we may have the final Relation in pig.
- Store helps us to export the relation(resultant data) out of pig
- STORE is the opposite of LOAD. We used LOAD for loading data from HDFS to a Relation
- STORE is used for loading data from Relation to HDFS.
hadoop fs -ls /
STORE Inner_join INTO '/pig_Inner_join/' USING PigStorage (',');
hadoop fs -ls /
hadoop fs -rmr /pig_datsets
Check on http://localhost:50070/explorer.html#/
History
- History helps us to see all the command we ran in order
history