HBase row key design for monotonically increasing keys

回眸只為那壹抹淺笑 提交于 2019-12-21 04:02:46

问题


I've an HBase table where I'm writing the row keys like:

<prefix>~1
<prefix>~2
<prefix>~3
...
<prefix>~9
<prefix>~10

The scan on the HBase shell gives an output:

<prefix>~1
<prefix>~10
<prefix>~2
<prefix>~3
...
<prefix>~9

How should a row key be designed so that the row with key <prefix>~10 comes last? I'm looking for some recommended ways or the ways that are more popular for designing HBase row keys.


回答1:


How should a row key be designed so that the row with key ~10 comes last?

You see the scan output in this way because rowkeys in HBase are kept sorted lexicographically irrespective of the insertion order. This means that they are sorted based on their string representations. Remember that rowkeys in HBase are treated as an array of bytes having a string representation. The lowest order rowkey appears first in a table. That's why 10 appears before 2 and so on. See the sections Rows on this page to know more about this.

When you left pad the integers with zeros their natural ordering is kept intact while sorting lexicographically and that's why you see the scan order same as the order in which you had inserted the data. To do that you can design your rowkeys as suggested by @shutty.

I'm looking for some recommended ways or the ways that are more popular for designing HBase row keys.

There are some general guidelines to be followed in order to devise a good design :

  • Keep the rowkey as small as possible.
  • Avoid using monotonically increasing rowkeys, such as timestamp etc. This is a poor shecma design and leads to RegionServer hotspotting. If you can't avoid that use someway, like hashing or salting to avoid hotspotting.
  • Avoid using Strings as rowkeys if possible. String representation of a number takes more bytes as compared to its integer or long representation. For example : A long is 8 bytes. You can store an unsigned number up to 18,446,744,073,709,551,615 in those eight bytes. If you stored this number as a String -- presuming a byte per character -- you need nearly 3x the bytes.
  • Use some mechanism, like hashing, in order to get uniform distribution of rows in case your regions are not evenly loaded. You could also create pre-splitted tables to achieve this.

See this link for more on rowkey design.

HTH




回答2:


HBase stores rowkeys in lexicographical order, so you can try to use this schema with fixed-length rowrey:

<prefix>~0001
<prefix>~0002
<prefix>~0003
...
<prefix>~0009
<prefix>~0010

Keep in mind that you also should use random prefixes to avoid region hot-spotting (when a single region accepts most of the writes, while the other regions are idle).




回答3:


monotonically increasing keys isnt a good schema for hbase. you can read more here: http://hbase.apache.org/book/rowkey.design.html

there also a link there to OpenTSDB that solve this problem.




回答4:


Fixed length keys are really recommended if possible. Bytes.toBytes(Long value) can be used to get a byte array from a counter. It will sort well for positive longs less than Long.MAX_VALUE.



来源:https://stackoverflow.com/questions/17792328/hbase-row-key-design-for-monotonically-increasing-keys

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!