set HBase properties for Spark Job using spark-submit

一曲冷凌霜 提交于 2021-01-28 05:26:16

问题


During Hbase data migration I have encountered ajava.lang.IllegalArgumentException: KeyValue size too large

In long term :

I need to increase the properties hbase.client.keyvalue.maxsize (from 1048576 to 10485760) in the /etc/hbase/conf/hbase-site.xml but I can't change this file now (I need validation).

In short term :

I have success to import data using command :

hbase org.apache.hadoop.hbase.mapreduce.Import \
  -Dhbase.client.keyvalue.maxsize=10485760 \
  myTable \
  myBackupFile

Now I need to run a Spark Job using spark-submit

What is the better way :

  • Prefix the HBase properties with 'spark.' (I'm not sure it's possible and if it's works)
spark-submit \
  --conf spark.hbase.client.keyvalue.maxsize=10485760
  • Using 'spark.executor.extraJavaOptions' and 'spark.driver.extraJavaOptions' to explicitly transmit HBase properties
spark-submit \
  --conf spark.executor.extraJavaOptions=-Dhbase.client.keyvalue.maxsize=10485760 \
  --conf spark.driver.extraJavaOptions=-Dhbase.client.keyvalue.maxsize=10485760

回答1:


If you can change your code, you should be able to set these properties programmatically. I think something like this used to work for me in the past in Java:

Configuration conf = HBaseConfiguration.create();
conf.set("hbase.client.scanner.timeout.period", SCAN_TIMEOUT); // set BEFORE you create the connection object below:
Connection conn = ConnectionFactory.createConnection(conf);


来源:https://stackoverflow.com/questions/60053217/set-hbase-properties-for-spark-job-using-spark-submit

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!