HBase regions automatic splitting using hbase.hregion.max.filesize

白昼怎懂夜的黑 提交于 2019-12-04 06:03:07
haydenmarchant

@mpiffaretti, what you are seeing is very valid. I also got a little shock when I saw the regions sizes after an automatic split for the first time.

In HBase 0.94+, the default split policy is IncreasingToUpperBoundRegionSplitPolicy. The region size is decided by following the algorithm described below.

Split size is the number of regions that are on this server that all are of the same table, cubed, times 2x the region flush size OR the maximum region split size, whichever is smaller. For example, if the flush size is 128M, then after two flushes (256MB) we will split which will make two regions that will split when their size is 2^3 * 128M*2 = 2048M. If one of these regions splits, then there are three regions and now the split size is 3^3 * 128M*2 = 6912M, and so on until we reach the configured maximum filesize and then from there on out, we'll use that.

This is quite a nice strategy since you start to get a nice spread of regions over the region servers without having to wait until they reach the 10GB limit.

Alternatively, you would be better off pre-splitting your tables, since you want to make sure that you are getting the most out of the processing power of your cluster - if you have a single Region, all requests will go to the Region Server to which the region is assigned. Pre-splitting outs the control into your hands of how the regions are split over the row-key space.

Pr-splitting is better option. Hope your data is not continuously inserted into a single region and on reaching region limit, does splitting or compaction.

In that condition writes are not uniformly distributed and on compaction of table becomes a bottle neck for writing modules.

No of requests on Active region will be high.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!