HBase storing data for a particular column with 2 or more values for the same row-key in Scala/Java API

主宰稳场 提交于 2019-12-11 07:29:45

问题


I have a file with following contents:

UserID   Email             
1001     abc@yahoo.com     
1001     def@gmail.com     
1002     gft@gmail.com
1002     rtf@yahoo.com

I want to store the data like this:

ROW          COLUMN+CELL                                                                                   
1001         column=cf:Email, timestamp=1487917201278, value=abc@yahoo.com 
1001         column=cf:Email, timestamp=1487917201279, value=def@gmail.com                                                                                                
1002         column=cf:Email, timestamp=1487917201286, value=gft@gmail.com
1002         column=cf:Email, timestamp=1487917201287, value=rtf@yahoo.com

I am using Put for example: put 'table', '1001', 'cf:Email', 'def@gmail.com' but it is giving me

ROW          COLUMN+CELL                                                                                    
1001         column=cf:Email, timestamp=1487917201279, value=def@gmail.com                                                                                                
1002         column=cf:Email, timestamp=1487917201286, value=rtf@yahoo.com

It is overriding the previous value. But HBase supposed to store multiple values for a particular column based on timestamp. Is there anyway that I can store both email addresses for particular UserID?


回答1:


You may want to take a closer look at the HBase documentation on versions. Note especially where it says

By default, i.e. if you specify no explicit version, when doing a get, the cell whose version has the largest value is returned

But I wouldn't pursue using multiple versions to store multiple values this way. You have to explicitly specify the maximum number of versions and it will apply to every column in that family. I would be more inclined to use distinct column names (such as Email1, Email2, ...)




回答2:


You need to specify the number of versions for the "cf" column family. By default, the number of versions is 1. Do the following in HBase shell to modify existing table:

alter 'table', {NAME => 'cf', VERSIONS => 2147483647}

Read more about versions in HBase here.



来源:https://stackoverflow.com/questions/42449609/hbase-storing-data-for-a-particular-column-with-2-or-more-values-for-the-same-ro

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!