How to convert .txt file to Hadoop's sequence file format

后端未结

关注

 7  1619

独厮守ぢ 2020-11-29 01:19

To effectively utilise map-reduce jobs in Hadoop, i need data to be stored in hadoop\'s sequence file format. However,currently the data is only in flat .txt format.Can anyo

7条回答

谎友^ (楼主)

2020-11-29 02:21
You can also just create an intermediate table, LOAD DATA the csv contents straight into it, then create a second table as sequencefile (partitioned, clustered, etc..) and insert into select from the intermediate table. You can also set options for compression, e.g.,
```
set hive.exec.compress.output = true;
set io.seqfile.compression.type = BLOCK;
set mapred.output.compression.codec = org.apache.hadoop.io.compress.SnappyCodec;

create table... stored as sequencefile;

insert overwrite table ... select * from ...;
```
The MR framework will then take care of the heavylifting for you, saving you the trouble of having to write Java code.
0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...