How to convert .txt file to Hadoop's sequence file format

后端 未结 7 1624
独厮守ぢ
独厮守ぢ 2020-11-29 01:19

To effectively utilise map-reduce jobs in Hadoop, i need data to be stored in hadoop\'s sequence file format. However,currently the data is only in flat .txt format.Can anyo

7条回答
  •  慢半拍i
    慢半拍i (楼主)
    2020-11-29 02:07

    import java.io.IOException;
    import java.net.URI;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IOUtils;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.SequenceFile;
    import org.apache.hadoop.io.Text;
    
    //White, Tom (2012-05-10). Hadoop: The Definitive Guide (Kindle Locations 5375-5384). OReilly Media - A. Kindle Edition. 
    
    public class SequenceFileWriteDemo { 
    
        private static final String[] DATA = { "One, two, buckle my shoe", "Three, four, shut the door", "Five, six, pick up sticks", "Seven, eight, lay them straight", "Nine, ten, a big fat hen" };
    
        public static void main( String[] args) throws IOException { 
            String uri = args[ 0];
            Configuration conf = new Configuration();
            FileSystem fs = FileSystem.get(URI.create( uri), conf);
            Path path = new Path( uri);
            IntWritable key = new IntWritable();
            Text value = new Text();
            SequenceFile.Writer writer = null;
            try { 
                writer = SequenceFile.createWriter( fs, conf, path, key.getClass(), value.getClass());
                for (int i = 0; i < 100; i ++) { 
                    key.set( 100 - i);
                    value.set( DATA[ i % DATA.length]);
                    System.out.printf("[% s]\t% s\t% s\n", writer.getLength(), key, value); 
                    writer.append( key, value); } 
            } finally 
            { IOUtils.closeStream( writer); 
            } 
        } 
    }
    

提交回复
热议问题