To effectively utilise map-reduce jobs in Hadoop, i need data to be stored in hadoop\'s sequence file format. However,currently the data is only in flat .txt format.Can anyo
You can also just create an intermediate table, LOAD DATA the csv contents straight into it, then create a second table as sequencefile (partitioned, clustered, etc..) and insert into select from the intermediate table. You can also set options for compression, e.g.,
set hive.exec.compress.output = true;
set io.seqfile.compression.type = BLOCK;
set mapred.output.compression.codec = org.apache.hadoop.io.compress.SnappyCodec;
create table... stored as sequencefile;
insert overwrite table ... select * from ...;
The MR framework will then take care of the heavylifting for you, saving you the trouble of having to write Java code.