Using Counters in MapReduce to Track Bad Records

By Ravi Karamsetty | September 3, 2014 |

The MapReduce framework provides Counters as an efficient mechanism for tracking the occurrences of global events within the map and reduces the phases of jobs. For example, a typical MapReduce job will kick off several mapper instances, one for each block of the input data, all running the same code. These instances are part of

Using MRUnit to Develop and Test MapReduce Jobs

By Ravi Karamsetty | August 28, 2014 |

Conceptually, MapReduce jobs are relatively simple. In the map phase, each input record has a function applied to it, resulting in one or more key-value pairs. The reduce phase receives a group of the key-value pairs and performs some function over that group. Testing mappers and reducers should be as easy as testing any other

Hadoop MapReduce Streaming Using Bash Script.

By Ravi Karamsetty | August 5, 2014 |

MapReduce has a feature known as Hadoop Streaming that gives the flexibility to write code in your favorite language other than Java. You can use Ruby, Perl, Python or even quickly write a MapReduce job using shell script.

