In Spark sc.newAPIHadoopRDD is reading 2.7 GB data the with 5 partitions

半腔热情 提交于 2019-12-23 03:35:19

问题


I am using spark 1.4 and I am trying to read the data from Hbase by using sc.newAPIHadoopRDD to read 2.7 GB data but there are 5 task are created for this stage and taking 2 t0 3 minutes to process it. Can anyone let me know how to increase the more partitions to read the data fast ?


回答1:


org.apache.hadoop.hbase.mapreduce.TableInputFormat creates a partition for each region. Your table seems to be split into 5 regions. Pre-splitting your table should increase the number of partitions (have a look here for more information on splitting).



来源:https://stackoverflow.com/questions/39628851/in-spark-sc-newapihadooprdd-is-reading-2-7-gb-data-the-with-5-partitions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!