HBase scan operation caching

喜夏-厌秋 提交于 2019-12-11 10:38:38

问题


What is the difference between setCaching and setBatch at HBase scan mechanism? What I must use for best performance during scan large data volumes?


回答1:


Unless you have super-wide tables with many columns (or very large ones) you should completely forgot about setBatch() and focus exclusively on setCaching():


setCaching(int caching)

Set the number of rows for caching that will be passed to scanners. If not set, the Configuration setting HConstants.HBASE_CLIENT_SCANNER_CACHING will apply. Higher caching values will enable faster scanners but will use more memory.

setBatch(int batch)

Set the maximum number of values to return for each call to next()


setBatch is about the number of values of the row that should be returned on each call/iteration. Here's a nice post about it: http://blog.jdwyah.com/2013/08/hbase-scan-batch-vs-cache.html




回答2:


Specify a scanner cache that will be filled before the Scan result is returned, setting setCaching to the number of rows to cache before returning the result. By default, the caching setting on the table is used. The goal is to balance IO and network load.

public Scan setCaching(int caching)

To limit the number of columns if your table has very wide rows (rows with a large number of columns), use setBatch(int batch) and set it to the number of columns you want to return in one batch. A large number of columns is not a recommended design pattern.

public Scan setBatch(int batch)

this is nice link http://www.cloudera.com/documentation/enterprise/5-5-x/topics/admin_hbase_scanning.html



来源:https://stackoverflow.com/questions/28456876/hbase-scan-operation-caching

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!