Flume Installation

Using Apache Flume, you can fetch data from various services and transport it to centralized stores. In this tutorial, you are going to install Apache Flume and you will test our installation with the help of an example.

Step 1: Login Through Hadoop User

You have already installed hadoop using a user named hduser and you will be going to install flume using hduser. So, your first step is to log-in through hduser.

    $ su - hduser

Step 2: Download a Stable Release

Download any stable Flume release from Apache’s Website which will look like the image given below:

Download any stable release. Here in this post, it has been tested with apache-flume-1.6.0. When you click on the release folder, it prompts a message to save as shown in the image given below:

Press Ok button to download.

Step 3: Extract and Move

Go to download folder, put your root-username at the place of “user” and execute the following command:

    $ cd /home/user/Downloads/
    $ sudo tar -zxvf apache-flume-1.6.0-bin.tar.gz 
    $ sudo mv apache-flume-1.6.0-bin  /usr/local/flume

It has been done as shown in the image given below:

Step 4: Update the .bashrc:

Update the .bashrc file to export flume variables.

    $ sudo nano ~/.bashrc

Copy and paste the below lines at the end of .bashrc:

    #FLUME VARIABLES START
    export FLUME_HOME=/usr/local/flume
    export FLUME_CONF_DIR=$FLUME_HOME/conf
    export FLUME_CLASSPATH=$FLUME_CONF_DIR
    export PATH=$PATH:$FLUME_HOME/bin
    #FLUME VARIABLES END

As shown in the image given below:

Use Ctrl+X and Y to save.

To make the above changes permanent in .bashrc, run the command given below:

    $ source ~/.bashrc

Step 5: Change Group and User

Change the group and user of flume i.e., assign an owner and a group to flume. In this case, it is user:hduser and group:hadoop.

    $ sudo chown -R hduser:hadoop /usr/local/flume

Step 6: Create flume-env.sh

To create the above file, just rename flume-env.sh.template to flume-env.sh by using the command given below:

    $ sudo cp /usr/local/flume/conf/flume-env.sh.template /usr/local/flume/conf/flume-env.sh

Step 7: Update the flume-env.sh file:

    $ cd /usr/local/flume/conf
    $ sudo nano flume-env.sh

Update the following lines in the flume-env.sh:

    export JAVA_OPTS="-Xms500m -Xmx1000m -Dcom.sun.management.jmxremote"
    export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

Use Ctrl+X and Y to save.

Your updated flume-env.sh looks as shown in the image given below:

NOTE: You can check your JAVA_HOME using the command given below:

    $ echo $JAVA_HOME

    $ cd ..
    $ cd bin/

Step 8 :Verify Flume Installation

    $ flume-ng

It will give the output as shown in the image given below:

A simple example

Here, an example is given of the configuration file which describes a single-node Flume deployment. This configuration let the user to generate events and subsequently logs them to the console.

Create a file named example.conf and paste the following contents given below:

    $ sudo gedit example.conf

    # example.conf: A single-node Flume configuration

    # Name the components on this agent
    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1

    # Describe/configure the source
    a1.sources.r1.type = netcat
    a1.sources.r1.bind = localhost
    a1.sources.r1.port = 44444

    # Describe the sink
    a1.sinks.k1.type = logger

    # Use a channel which buffers events in memory
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 100

    # Bind the source and sink to the channel
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1

This configuration defines a single agent named a1. a1 has a source that listens for data on port 44444, a channel that buffers event data in memory, and a sink that logs event data to the console. The configuration file names the various components, then describes their types and configuration parameters. A given configuration file might define several named agents; when a given Flume process is launched a flag is passed telling it which named agent to manifest.

Given this configuration file, you can start Flume as follows:

    $ flume-ng agent --conf conf --conf-file example.conf --name a1 -Dflume.root.logger=INFO,console

On this system, it is been used this command as shown in the image given below:

And the output came as the image given below:

Note – In a full deployment, you would typically include one more option: –conf=. The directory would include a shell script flume-env.sh and potentially a log4j properties file. In this example, you pass a Java option to force Flume to log to the console and go without a custom environment script.

From a separate terminal (Press Ctrl+Shift+T to open a new terminal ), you can then telnet port 44444 and send Flume an event:

    $ telnet localhost 44444
    Trying 127.0.0.1...
    Connected to localhost.
    Escape character is '^]'.
    Hello world! <Enter>
    OK

It has been done as shown in the image given below:

The original Flume terminal will output the event in a log message.

    17/01/03 14:42:21 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started
    17/01/03 14:42:21 INFO node.Application: Starting Sink k1
    17/01/03 14:42:21 INFO node.Application: Starting Source r1
    17/01/03 14:42:21 INFO source.NetcatSource: Source starting
    17/01/03 14:42:21 INFO source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:44444]
    17/01/03 14:45:25 INFO sink.LoggerSink: Event: { headers:{} body: 48 65 6C 6C 6F 20 77 6F 72 6C 64 21 0D          Hello world!. }

On this system, the original Flume terminal was appeared like the image given below:

To exit the screen, press Ctrl+C

Congratulations – youâve successfully configured and deployed a Flume agent!