问题
I want to bulk load data to mulitple tables using a single mapreduce job.Since the data volumes is high ,It would be time consuming to iterate through dataset twice and load using multiple jobs.Is there any way to do this ? Thanks in advance.
回答1:
I am using Hbase. But i didnt need bulk load yet. But I came across this article which might help you.
http://hbase.apache.org/book/arch.bulk.load.html
The bulk load feature uses a MapReduce job to output table data in HBase's internal data format, and then directly loads the generated StoreFiles into a running cluster. Using bulk load will use less CPU and network resources than simply using the HBase API.
来源:https://stackoverflow.com/questions/19079370/bulk-load-to-multiple-hbase-tables-in-single-job