Apache Flume Installation
Using Apache Flume, you can fetch data from various services and transport it to centralized stores. In this tutorial, you are going to install Apache Flume and you will test our installation with the help of an example.
Step 1: Login Through Hadoop User
You have already installed hadoop using a user named hduser and you will be going to install flume using hduser. So, your first step is to log-in through hduser.
$ su - hduser
Step 2: Download a Stable Release
Download any stable Flume release from Apache’s Website which will look like the image given below:
Download any stable release. Here in this post, it has been tested with apache-flume-1.6.0. When you click on the release folder, it prompts a message to save as shown in the image given below:
Press Ok button to download.
Step 3: Extract and Move
Go to download folder, put your root-username at the place of “user” and execute the following command:
$ cd /home/user/Downloads/
$ sudo tar -zxvf apache-flume-1.6.0-bin.tar.gz
$ sudo mv apache-flume-1.6.0-bin /usr/local/flume
It has been done as shown in the image given below:
Step 4: Update the .bashrc:
Update the .bashrc file to export flume variables.
$ sudo nano ~/.bashrc
Copy and paste the below lines at the end of .bashrc:
#FLUME VARIABLES START
export FLUME_HOME=/usr/local/flume
export FLUME_CONF_DIR=$FLUME_HOME/conf
export FLUME_CLASSPATH=$FLUME_CONF_DIR
export PATH=$PATH:$FLUME_HOME/bin
#FLUME VARIABLES END
As shown in the image given below:
Use Ctrl+X and Y to save.
To make the above changes permanent in .bashrc, run the command given below:
$ source ~/.bashrc
Step 5: Change Group and User
Change the group and user of flume i.e., assign an owner and a group to flume. In this case, it is user:hduser and group:hadoop.
$ sudo chown -R hduser:hadoop /usr/local/flume
Step 6: Create flume-env.sh
To create the above file, just rename flume-env.sh.template to flume-env.sh by using the command given below:
$ sudo cp /usr/local/flume/conf/flume-env.sh.template /usr/local/flume/conf/flume-env.sh
Step 7: Update the flume-env.sh file:
$ cd /usr/local/flume/conf
$ sudo nano flume-env.sh
Update the following lines in the flume-env.sh:
export JAVA_OPTS="-Xms500m -Xmx1000m -Dcom.sun.management.jmxremote"
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
Use Ctrl+X and Y to save.
Your updated flume-env.sh looks as shown in the image given below:
NOTE: You can check your JAVA_HOME using the command given below:
$ echo $JAVA_HOME
$ cd ..
$ cd bin/
Step 8 :Verify Flume Installation
$ flume-ng
It will give the output as shown in the image given below:
A simple example
Here, an example is given of the configuration file which describes a single-node Flume deployment. This configuration let the user to generate events and subsequently logs them to the console.
Create a file named example.conf and paste the following contents given below:
$ sudo gedit example.conf
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
This configuration defines a single agent named a1. a1 has a source that listens for data on port 44444, a channel that buffers event data in memory, and a sink that logs event data to the console. The configuration file names the various components, then describes their types and configuration parameters. A given configuration file might define several named agents; when a given Flume process is launched a flag is passed telling it which named agent to manifest.
Given this configuration file, you can start Flume as follows:
$ flume-ng agent --conf conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console
On this system, it is been used this command as shown in the image given below:
And the output came as the image given below:
Note – In a full deployment, you would typically include one more option: –conf=. The directory would include a shell script flume-env.sh and potentially a log4j properties file. In this example, you pass a Java option to force Flume to log to the console and go without a custom environment script.
From a separate terminal (Press Ctrl+Shift+T to open a new terminal ), you can then telnet port 44444 and send Flume an event:
$ telnet localhost 44444
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Hello world! <Enter>
OK
It has been done as shown in the image given below:
The original Flume terminal will output the event in a log message.
17/01/03 14:42:21 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started
17/01/03 14:42:21 INFO node.Application: Starting Sink k1
17/01/03 14:42:21 INFO node.Application: Starting Source r1
17/01/03 14:42:21 INFO source.NetcatSource: Source starting
17/01/03 14:42:21 INFO source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:44444]
17/01/03 14:45:25 INFO sink.LoggerSink: Event: { headers:{} body: 48 65 6C 6C 6F 20 77 6F 72 6C 64 21 0D Hello world!. }
On this system, the original Flume terminal was appeared like the image given below:
To exit the screen, press Ctrl+C
Congratulations – youâve successfully configured and deployed a Flume agent!