Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

In new API (apache.hadoop.mapreduce.KeyValueTextInputFormat) , how to specify separator (delimiter) other than tab(which is default) to separate key and Value.

Sample Input :

one,first line
two,second line

Ouput Required :

Key : one
Value : first line
Key : two
Value : second line

I am specifying KeyValueTextInputFormat as :

    Job job = new Job(conf, "Sample");

    job.setInputFormatClass(KeyValueTextInputFormat.class);
    KeyValueTextInputFormat.addInputPath(job, new Path("/home/input.txt"));

This is working fine for tab as a separator.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
198 views
Welcome To Ask or Share your Answers For Others

1 Answer

In the newer API you should use mapreduce.input.keyvaluelinerecordreader.key.value.separator configuration property.

Here's an example:

Configuration conf = new Configuration();
conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", ",");

Job job = new Job(conf);
job.setInputFormatClass(KeyValueTextInputFormat.class);
// next job set-up

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...