How to convert .txt file to Hadoop's sequence file format

后端 未结 7 1625
独厮守ぢ
独厮守ぢ 2020-11-29 01:19

To effectively utilise map-reduce jobs in Hadoop, i need data to be stored in hadoop\'s sequence file format. However,currently the data is only in flat .txt format.Can anyo

7条回答
  •  时光说笑
    2020-11-29 01:57

    So the way more simplest answer is just an "identity" job that has a SequenceFile output.

    Looks like this in java:

        public static void main(String[] args) throws IOException,
            InterruptedException, ClassNotFoundException {
    
        Configuration conf = new Configuration();
        Job job = new Job(conf);
        job.setJobName("Convert Text");
        job.setJarByClass(Mapper.class);
    
        job.setMapperClass(Mapper.class);
        job.setReducerClass(Reducer.class);
    
        // increase if you need sorting or a special number of files
        job.setNumReduceTasks(0);
    
        job.setOutputKeyClass(LongWritable.class);
        job.setOutputValueClass(Text.class);
    
        job.setOutputFormatClass(SequenceFileOutputFormat.class);
        job.setInputFormatClass(TextInputFormat.class);
    
        TextInputFormat.addInputPath(job, new Path("/lol"));
        SequenceFileOutputFormat.setOutputPath(job, new Path("/lolz"));
    
        // submit and wait for completion
        job.waitForCompletion(true);
       }
    

提交回复
热议问题