问题
I have a file with following contents:
UserID Email
1001 abc@yahoo.com
1001 def@gmail.com
1002 gft@gmail.com
1002 rtf@yahoo.com
I want to store the data like this:
ROW COLUMN+CELL
1001 column=cf:Email, timestamp=1487917201278, value=abc@yahoo.com
1001 column=cf:Email, timestamp=1487917201279, value=def@gmail.com
1002 column=cf:Email, timestamp=1487917201286, value=gft@gmail.com
1002 column=cf:Email, timestamp=1487917201287, value=rtf@yahoo.com
I am using Put
for example: put 'table', '1001', 'cf:Email', 'def@gmail.com'
but it is giving me
ROW COLUMN+CELL
1001 column=cf:Email, timestamp=1487917201279, value=def@gmail.com
1002 column=cf:Email, timestamp=1487917201286, value=rtf@yahoo.com
It is overriding the previous value. But HBase supposed to store multiple values for a particular column based on timestamp. Is there anyway that I can store both email addresses for particular UserID?
回答1:
You may want to take a closer look at the HBase documentation on versions. Note especially where it says
By default, i.e. if you specify no explicit version, when doing a
get
, the cell whose version has the largest value is returned
But I wouldn't pursue using multiple versions to store multiple values this way. You have to explicitly specify the maximum number of versions and it will apply to every column in that family. I would be more inclined to use distinct column names (such as Email1
, Email2
, ...)
回答2:
You need to specify the number of versions for the "cf" column family. By default, the number of versions is 1. Do the following in HBase shell to modify existing table:
alter 'table', {NAME => 'cf', VERSIONS => 2147483647}
Read more about versions in HBase here.
来源:https://stackoverflow.com/questions/42449609/hbase-storing-data-for-a-particular-column-with-2-or-more-values-for-the-same-ro