Using Counters in MapReduce to Track Bad Records

By Ravi Karamsetty | September 3, 2014 |

The MapReduce framework provides Counters as an efficient mechanism for tracking the occurrences of global events within the map and reduces the phases of jobs. For example, a typical MapReduce job will kick off several mapper instances, one for each block of the input data, all running the same code. These instances are part of … Continue reading “Using Counters in MapReduce to Track Bad Records”

Read More

Using MRUnit to Develop and Test MapReduce Jobs

By Ravi Karamsetty | August 28, 2014 |

Conceptually, MapReduce jobs are relatively simple. In the map phase, each input record has a function applied to it, resulting in one or more key-value pairs. The reduce phase receives a group of the key-value pairs and performs some function over that group. Testing mappers and reducers should be as easy as testing any other … Continue reading “Using MRUnit to Develop and Test MapReduce Jobs”

Read More

Hadoop MapReduce Streaming Using Bash Script.

By Ravi Karamsetty | August 5, 2014 |

MapReduce has a feature known as Hadoop Streaming that gives the flexibility to write code in your favorite language other than Java. You can use Ruby, Perl, Python or even quickly write a MapReduce job using shell script.

Read More