Saturday, November 22, 2014

Setup Kafka in a single machine running Ubuntu 14.04 LTS

Kafka is a messaging system that can acts as a buffer and feeder for messages processed by Storm spouts. It can also be used as a output buffer for Storm bolts. This post shows how to setup and test Kafka on a single machine running Ubuntu.

Firstly download the kafka 0.8.1.1 from the link below:

https://www.apache.org/dyn/closer.cgi?path=/kafka/0.8.1.1/kafka_2.8.0-0.8.1.1.tgz

Next "tar -xvzf" the kafka_2.8.0-0.8.1.1.tgz file and move it to a destination folder (say, /Documents/Works/Kafka folder under the user root directory):

> tar -xvzf kafka_2.8.0-0.8.1.1.tgz
> mkdir $HOME/Documents/Works/Kafka
> mv kafka_2.8.0-0.8.1.1 $HOME/Documents/Works/Kafka

Now go back to the user root folder and open the .bashrc file for editing:

> cd $HOME
> gedit .bashrc

In the .bashrc file, add the following line to the end:

export KAFKA_HOME=$HOME/Documents/Works/Kakfa/kafka_2.8.0-0.8.1.1

Save and close the .bashrc and run "source .bashrc" to update the environment variables. Now navigate to the kafka home folder and edit the server.properties in its sub-directory "config":

> cd $KAFKA_HOME/config
> gedit server.properties

In the server.properties file, search the line "zookeeper.connect" and change it to the following:

zookeeper.connect=192.168.2.2:2181,192.168.2.4:2181

search the line "log.dirs" and change it to the following:

log.dirs=/var/kafka-logs

Save and close the server.properties file (192.168.2.2 and 192.168.2.4 are the zookeeper nodes). Next we go and create the folder /var/kafka-logs (which will store the topics and partitions data for kafka) with write permissions:

> sudo mkdir /var/kafka-logs
> sudo chmod -R 777 /var/kafka-logs

Now set up and run the zookeeper cluster by following instructions in the link http://czcodezone.blogspot.sg/2014/11/setup-zookeeper-in-cluster.html. Once this is done, we are ready to start the kafka messaging system by running the following commands:

> cd $KAFKA_HOME
> bin/kafka-server-start.sh config/server.properties

To start testing kafka setup, Ctrl+Alt+T to open a new terminal and run the following command to create a topic "verification-topic" (a topic is a named entity in kafka which contain one or more partitions which are message queues that can run in parallel and serialize to individual folder in /var/kafka-log folder):

> cd $KAKFA_HOME
> bin/kafka-topics.sh --create --zookeeper 192.168.2.2:2181 --topic verification-topic --partitions 1 --replication-factor 1

The above command creates a topic named "verification-topic" which contains 1 partition (and with no replication)

Now we can check the list of topics in kafka by running the following command:

> bin/kafka-topics.sh --zookeeper 192.168.2.2:2181 --list

To test the producer and consumer interaction in kafka, fire up the console producer by running

> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic verification-topic

9092 is the default port for a kafka broker node (which is localhost at the moment). Now the terminal enter interaction mode. Let's open another terminal and run the console consumer:

> bin/kafka-console-consumer.sh --zookeeper 192.168.2.2:2181 --topic verification-topic

Now enter some data in the console producer terminal and you should see the data immediately display in the console consumer terminal.