Hbase - How to add a super column family?

前提是你 提交于 2019-12-12 03:59:59

问题


I am trying to create Java application that convert MYSQL database to NOSQL Hbase database .
So far it read the data from mysql and insert it to hbase correctely
But now i'am trying to handle relationship between tables of MYSQL, and i understand if there are relationship you should add one of table as super column family .
I looked in apatch website documentation i couldn't find anything.
Any ideas ?


回答1:


Column family has nothing to do with relationship. In contrast you have to correctly create inversed indexes via row key design which may allow to effectively O(1) retrieve data from one table by knowing key from another. Or to avoid join try to store all data in one row. Any tool that provides SQL interface for HBase spawns jobs which take time to start and execute. HBase is fast if you do Get operation or Scan successive rows. Hope this was useful.

Update

Regarding more details about column families check out great book Architecting HBase Applications

A column family is an HBase-specific concept that you will not find in other RDBMS applications. For the same region, different column families will store the data into different files and can be configured differently. Data with the same access pattern and the same format should be grouped into the same column family. As an example regarding the format, if you need to store a lot of textual metadata information for customer profiles in addition to image files for each customer’s profile photo, you might want to store them into two different column families: one compressed (where all the textual information will be stored), and one not compressed (where the image files will be stored). As an example regarding the access pattern, if some information is mostly read and almost never written, and some is mostly written and almost never read, you might want to separate them into two different column families. If the different columns you want to store have a similar format and access pattern, regroup them within the same column family. The write cache memory area for a given RegionServer is shared by all the column families configured for all the regions hosted by the given host. Abusing column families will put pressure on the memstore, which will generate many small files, which in turn will generate a lot of compactions that might impact the performance. There is no technical limitation on the number of column families you can configure for a table. However, over the last three years, most of the use cases we had the chance to work on only required a single column family. Some required two column families, but each time we have seen more than two column families, it has been possible and recommended to reduce the number to improve efficiency. If your design includes more than three column families, you might want to take a deeper look at it and see if all those families are really required; most likely, they can be regrouped. If you do not have any consistency constraints between your two columns families and data will arrive into them at a different time, instead of creating two column families for a single table, you can also create two tables, each with a single column family. This strategy is useful when it comes time to decide the size of the regions. Indeed, while it was better to keep the two column families almost the same size, by splitting them accross two different tables, it is now easier to let me grow independently.

Also this answer can be useful.



来源:https://stackoverflow.com/questions/44880421/hbase-how-to-add-a-super-column-family

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!