# Hadoop: Getting My Hands Dirty

Those who are about to read this post, beware. Hadoop is a big beast. There are a lot of things churning inside. I am going to work on the Hadoop Scheduler this semester, so I will be dealing with it day-in and day-out. Having never used Hadoop, it took some time for me to get it up-and-running, and running some test code.

I am making some casual notes of how I went about it. It might help some of you in the future, hopefully, and will let me keep a ready-reference too.

• The first step is to understand, what Map Reduce, and Hadoop are. The Wiki pages for both are good-enough sources for the same.

• Then, to actually set up Hadoop on Ubuntu, I used this excellent tutorial. (Courtesy Deepak). The procedure should be slightly different for other distributions.

• You might want to read this to understand how Hadoop can be used to implement Map Reduce jobs.

• The Hadoop Wordcount example is a good place to understand how a basic Map Reduce job is written in Hadoop. This is the standard Hadoop documentation, which explains the example.However there are small changes in the API, and the documentation has not been updated to reflect that :-(. For example, OutputCollector has been replaced by the Context object in the Mapper and Reducer function prototypes. Here is a small summary. But mostly, you should be fine with the above-mentioned tutorial. Using the Wordcount example, I made a trivial modification to find the number of palindromes in a set of files. Here is the code.

These are the sequence of steps I followed to get my job running:

1. $HADOOP_HOME/bin/hadoop/start-all.sh 2. javac -classpath$HADOOP_HOME/hadoop-*-core.jar Palindrome.java

3. jar -cvfe /path/for/jar/file/Palindrome.jar Palindrome Palindrome.class Palindrome\$PalindromeMapper.class Palindrome\$PalindromeReducer.class

4. $HADOOP_HOME/bin/hadoop dfs -copyFromLocal /path/to/input /user/hadoop/input/path 5.$HADOOP_HOME/bin/hadoop jar /path/for/jar/file/Palindrome.jar /user/hadoop/input/path /user/hadoop/output/path