Using MRUnit to Develop and Test MapReduce Jobs

Conceptually, MapReduce jobs are relatively simple. In the map phase, each input record has a function applied to it, resulting in one or more key-value pairs. The reduce phase receives a group of the key-value pairs and performs some function over that group. Testing mappers and reducers should be as easy as testing any other function. A given input will result in an expected output. The complexities arise due to the distributed nature of Hadoop. Hadoop is a large framework with many moving parts. Prior to the release of MRUnit by Cloudera, even the simplest tests running in local mode would have to read from the disk and take several seconds each to set up and run.

MRUnit removes as much of the Hadoop framework as possible while developing and testing. The focus is narrowed to the map and reduce code, their inputs, and expected outputs. With MRUnit, developing and testing MapReduce code can be done entirely in the IDE, and these tests take fractions of a second to run.

Let us now demonstrate how MRUnit uses the IdentityMapper provided by the MapReduce framework in the lib folder. The IdentityMapper takes a key-value pair as input and emits the same key-value pair, unchanged.

package com.hadoop.mrunit.test;

import junit.framework.TestCase;

import org.apache.hadoop.mapred.lib.IdentityMapper;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mrunit.MapDriver;
import org.junit.Before;
import org.junit.Test;

public class IdentityMapperTest extends TestCase {
 private Mapper mapper;
 private MapDriver driver;
 public void setUp(){
  mapper = new IdentityMapper();
  driver = new MapDriver(mapper);
 public void testIdentityMapper1(){
  driver.withInput(new Text("key"), new Text("value"))
   .withOutput(new Text("key"), new Text("value"))
 public void testIdentityMapper2(){
  driver.withInput(new Text("key"), new Text("value"))
   .withOutput(new Text("key2"), new Text("value2"))

MRUnit is built on top of the popular JUnit testing framework. It uses the object-mocking library, Mockito, to mock most of the essential Hadoop objects so the user only needs to focus on the map and reduce logic. The MapDriver class runs the test. It is instantiated with a Mapper class. The withInput() method is called to provide input to the Mapperclass that the MapDriver class was instantiated with. The withOutput() method is called to provide output to validate the results of the call to the Mapper class. The call to the runTest() method actually calls the mapper, passing it the inputs and validating its outputs against the ones provided by the withOutput() method.

In this post we only showed the testing of a mapper. MRUnit also provides a ReduceDriverclass that can be used in the same way as MapDriver for testing reducers.