问题
Can I write output to multiple tables in HBase from my reducer? I went through different blog posts, but ma not able to find a way, even using MultiTableOutputFormat
.
I referred to this : Write to multiple tables in HBASE
But not able to figure out the API signature for context.write
call.
Reducer code:
public class MyReducer extends TableReducer<Text, Result, Put> {
private static final Logger logger = Logger.getLogger( MyReducer.class );
@SuppressWarnings( "deprecation" )
@Override
protected void reduce( Text key, Iterable<Result> data, Context context ) throws IOException, InterruptedException {
logger.info( "Working on ---> " + key.toString() );
for ( Result res : data ) {
Put put = new Put( res.getRow() );
KeyValue[] raw = res.raw();
for ( KeyValue kv : raw ) {
put.add( kv );
}
context.write( obj, put );
**// I dont know how to give table name here.**
}
}
}
回答1:
To identify the table names you should pass the table name as the key to context.write(key, put)
method:
ImmutableBytesWritable key = new ImmutableBytesWritable(Bytes.toBytes("tableName"));
context.write(key, put);
But if you want to load a huge amount of data via MapReduce job at once then it might be interesting for you to use MultiTableHFileOutputFormat
. This output format creates HFiles for every HBase table you need and then you can easily load these files with LoadIncrementalHFiles
tool:
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/multiTableJobResult hbaseTable
You can read more about MultiTableHFileOutputFormat
in the article: http://tech.adroll.com/blog/data/2014/07/15/multi-table-bulk-import.html
来源:https://stackoverflow.com/questions/37436095/write-output-to-multiple-tables-from-reducer