Sqoop Installation
Tested on Ubuntu 14.04 LTS using hadoop-2.6.0
Step 1: Download any stable release, move it to installation directory and untar it. Open the terminal and type the following commands to do so. In the below command, hduser is the hadoop user and you will be going to install SQOOP under hduser.
$ su - hduser
$ sudo wget http://apache.proserve.nl/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
$ sudo tar -xzvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
$ sudo mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha /usr/local/sqoop
$ sudo chown -R hduser:hadoop /usr/local/sqoop
Step 2: Update the .bashrc file
Open .bashrc file using nano editor by using the command given below:
$ sudo nano ~/.bashrc
Copy and paste the below lines at the end of .bashrc:
#SQOOP VARIABLES START
export SQOOP_HOME=/usr/local/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
#SQOOP VARIABLES END
Use Ctrl+X and Y to save.
Your .bashrc will look like the image given below:
To make the above changes in .bashrc, then run the command given below:
$ source ~/.bashrc
Step 3: Now download MySQL Java Connector: You might have already installed MySQL on your system, if you have installed the hive. For downloading the Java Connector, then type the below commands.
$ sudo wget http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.38.tar.gz
$ sudo tar -xzvf mysql-connector-java-5.1.38.tar.gz
$ sudo cp mysql-connector-java-5.1.38/mysql-connector-java-5.1.38-bin.jar /usr/local/sqoop/lib/
For confirming that sqoop installed successfully
$ sqoop version
You will get the output as shown in the image given below:
Installing MySQL Database Server If you have not installed MySQL yet, then no need to worry as you just use the command given below. Note that you can skip to step “SQOOP EXECUTION ENGINE” if you have already had installed MySQL.
$ sudo apt-get install mysql-server-5.5
(For the sake of convenience, set the password for user “root” as “root”.)
$ mysql -uroot -p
SQOOP EXECUTION ENGINE
Here, we are going to create a table called “student” in MySQL and we will import it in local file system and hadoop distributed file system using sqoop. Since we are going to use HDFS, we need to start the hadoop first.
$ start-dfs.sh
$ start-yarn.sh
Now create a table and insert some records in it, execute the follwing commands given below:
$ mysql -uroot -proot
mysql>create database test;
mysql>use test;
mysql>create table student(id int,name varchar(10));
mysql>insert into student values(1,'Anoop');
mysql>insert into student values(2,'Blessy');
mysql>insert into student values(3,'Maria');
mysql>insert into student values(4,'Sekhar');
mysql>exit
In order to test whether sqoop is working or not, then run these following commands from command line:
$ cd /usr/local/sqoop/bin
Command to generate a jar file
$ sqoop codegen --connect jdbc:mysql://localhost/test --username root -P --table student
You can take help from the image given below:
It will prompt for MySQL password thus we just need to give the password. This command will create a jar file < tablename >.jar, for this tutorial student.jar, under /tmp/sqoop-hduser compile/87c96b3f6ad4b8ec480d974144b4f3b4. The last directory under tmp may vary on your system. You need to take a bit care here, as this directory will be used in the subsequent commands.
You will get the output like the image given below.
Local Import: Import the table in Local File System:
$ sqoop import -fs local -jt local -libjars /tmp/sqoop-hduser/compile/87c96b3f6ad4b8ec480d974144b4f3b4/student.jar --connect jdbc:mysql://localhost/test --username root --P --table student
As the example table “student” do not have any primary key column, then we get an exception like the image given below:
So, we will use the “-m 1” field for the table which is having no primary key column.
$ sqoop import -fs local -jt local -libjars /tmp/sqoop-hduser/compile/87c96b3f6ad4b8ec480d974144b4f3b4/student.jar --connect jdbc:mysql://localhost/test --username root --P --table student -m 1
You can take help from the image given below:
You might get an exception that the FileAlreadyExistException like the highlighted part in the image given below:
Here you need not to get worry, we just need to remove the table using the command give below:
$ sudo rm -r student
And again execute the previous command like the image given below:
$ sqoop import -fs local -jt local -libjars /tmp/sqoop-hduser/compile/87c96b3f6ad4b8ec480d974144b4f3b4/student.jar --connect jdbc:mysql://localhost/test --username root --P --table student -m 1
You can take help from the image given below:
Once this command runs successfully, you will get the output like the image given below:
And also a directory called student will get generated in the local file system in the pwd, from where you ran above three commands. You can see the contents of sqoop generated directory “student” using commands given below:
$ cd student
$ cat part-m-00000
You can take help from the image given below:
HDFS Import:
The below command import the contents of student table into the HDFS directory /student. Here, in the HDFS, the student directory must not be available otherwise you will get some exception.
$ sqoop import -fs local -libjars /tmp/sqoop-hduser/compile/87c96b3f6ad4b8ec480d974144b4f3b4/student.jar --connect jdbc:mysql://localhost/test --username root --P --table student -m 1 --target-dir hdfs://localhost:54310/student
You can take help from the image given below:
Once this command runs successfully, you will get the output like the image given below:
And also contents of the student table, that had been created earlier, will move in the HDFS student directory. You can use cat command to see the contents of student by using the command given below:
$ hadoop fs -cat /student/part-m-00000
You will get the output like the image given below:
This ends sqoop installation and verfication.