Passing Parameters and Arguments to Mapper and Reducer in Hadoop

In Hadoop, it is sometimes difficult to pass arguments to mappers and reducers. If the number of arguments is huge (e.g., big arrays), DistributedCache might be a good choice. However, here, we’re discussing small arguments, usually a hand of configuration parameters.

In fact, the way to configure these parameters is simple. When you initialize “JobConf” object to launch a mapreduce job, you can set the parameter by using “set” method like:

1
2
JobConf job = (JobConf)getConf();
job.set("NumberOfDocuments", args[0]);

Here, “NumberOfDocuments” is the name of parameter and its value is read from “args[0]“, a command line argument. Once you set this arguments, you can retrieve its value in reducer or mapper as follows:

1
2
3
4
private static Long N;
public void configure(JobConf job) {
     N = Long.parseLong(job.get("NumberOfDocuments"));
}

Note, the tricky part is that you cannot set parameters like this:

1
2
Configuration con = new Configuration();
con.set("NumberOfDocuments", args[0]);

and hope that all mappers or reducers can retrieve this parameter. This will fail in running.

Leave a Comment


NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>