Hive Basic Vocabulary
Databases, Tables and more
- Databases:
- Contains tables, views, partitions etc.,
- Tables:
- Data records with same schema. For example quarterly month sales data table
Data Types
- Integers: TINYINT, SMALLINT, INT, BIGINT.
- Boolean: BOOLEAN.
- Floating point numbers: FLOAT, DOUBLE .
- String: STRING.
Hive Query Language
Create table
CREATE TABLE stack_overflow_tags(id BIGINT, title string, body string, tag1 string,tag2 string,tag3 string,tag4 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ESCAPED BY '\\' ;
Next step is to fill the data in table. Make sure that the dataset is on HDFS
dfs -ls / ;
Blank table is created with schema. Now fill the data inside the table.
LOAD DATA INPATH '/stack_overflow_hdfs' INTO TABLE stack_overflow_tags;
Check the data on hive data warehosue
http://localhost:50070/explorer.html#/user/hive/warehouse/stack_overflow_tags
Select first few rows
select * from stack_overflow_tags LIMIT 3;
See below part of the terminal:
Sending the output to a new file
INSERT OVERWRITE LOCAL DIRECTORY '/home/hduser/Output/hive_out1' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select * from stack_overflow_tags LIMIT 3;
Have a look at the exported file. Do this in a new (non-hive) terminal
cat /home/hduser/Output/hive_out1/000000_0