问题
We are using hbase version 1.1.4. The DB has a around 40 tables, and each table data has a TimeToLive specified. It is deployed on a 5 node cluster, and the following is the hbase-site.xml
<property>
<name>phoenix.query.threadPoolSize</name>
<value>2048</value>
</property>
<property>
<name>hbase.hregion.max.filesize</name>
<value>21474836480</value>
</property>
<property>
<name>hbase.hregion.memstore.block.multiplier</name>
<value>4</value>
</property>
<!-- default is 64MB 67108864 -->
<property>
<name>hbase.hregion.memstore.flush.size</name>
<value>536870912</value>
</property>
<!-- default is 7, should be at least 2x compactionThreshold -->
<property>
<name>hbase.hstore.blockingStoreFiles</name>
<value>240</value>
</property>
<property>
<name>hbase.client.scanner.caching</name>
<value>10000</value>
</property>
<property>
<name>hbase.bucketcache.ioengine</name>
<value>offheap</value>
</property>
<property>
<name>hbase.bucketcache.size</name>
<value>40960</value>
</property>
Question is that the number of regions on each of the regionservers keep growing. Currently we only merge regions using
merge_region in the hbase shell.
Is there any way to have only a fixed number of regions, on each server, or an automated way to merge the regions?
回答1:
Well it mostly depends on your data: how is it distributed across keys. Assuming your values have almost same size for all keys, you can use partitioning:
For example, if your table key is String and you want 100 regions, use this
public static byte[] hashKey(String key) {
int partition = Math.abs(key.hashCode() % 100);
String prefix = partitionPrefix(partition);
return Bytes.add(Bytes.toBytes(prefix), ZERO_BYTE, key);
}
public static String partitionPrefix(int partition) {
return StringUtils.leftPad(String.valueOf(partition), 2, '0');
}
In this case, all you keys will be prepended with numbers 00-99, so you have 100 partitions for 100 regions. Now you can disable region splits:
HTableDescriptor td = new HTableDescriptor(TableName.valueOf("myTable"));
td.setRegionSplitPolicyClassName("org.apache.hadoop.hbase.regionserver.DisabledRegionSplitPolicy");
or via shell
alter 'myTable', {TABLE_ATTRIBUTES => {METADATA => {'SPLIT_POLICY' => 'org.apache.hadoop.hbase.regionserver.DisabledRegionSplitPolicy'}}
回答2:
Is there any way to have only a fixed number of regions, on each server, or an automated way to merge the regions?
One way I have implemented this is create table with presplit regions. for example
create 'test_table', 'f1', SPLITS=> ['1', '2', '3', '4', '5', '6', '7', '8', '9']
design good rowkey with will starts with 1-9
you can use guava murmur hash like below.
import com.google.common.hash.HashCode;
import com.google.common.hash.HashFunction;
import com.google.common.hash.Hashing;
/**
* getMurmurHash.
*
* @param content
* @return HashCode
*/
public static HashCode getMurmurHash(String content) {
final HashFunction hf = Hashing.murmur3_128();
final HashCode hc = hf.newHasher().putString(content, Charsets.UTF_8).hash();
return hc;
}
final long hash = getMurmur128Hash(Bytes.toString(yourrowkey as string)).asLong();
final int prefix = Math.abs((int) hash % 9);
now append this prefix to your rowkey
For example
1rowkey1 // will go in to first region
2rowkey2 // will go in to second region
3rowkey3 // will go in to third region
...
9rowkey9 // will go in to ninth region
If you are doing pre-splitting, and want to manually manage region splits, you can also disable region splits, by setting hbase.hregion.max.filesize to a high number and setting the split policy to ConstantSizeRegionSplitPolicy. However, you should use a safeguard value of like 100GB, so that regions does not grow beyond a region server’s capabilities. You can consider disabling automated splitting and rely on the initial set of regions from pre-splitting for example, if you are using uniform hashes for your key prefixes, and you can ensure that the read/write load to each region as well as its size is uniform across the regions in the table.
Also, look at
来源:https://stackoverflow.com/questions/41968676/hbase-number-of-regions-keep-growing