Scan HTable rows for specific column value using HBase shell

前端 未结 7 910
臣服心动
臣服心动 2020-12-12 19:22

I want to scan rows in a HTable from hbase shell where a column family (i.e., Tweet) has a particular value (i.e., user_id).

Now I want to find all

相关标签:
7条回答
  • 2020-12-12 19:50

    From HBAse shell i think it is not possible because it is some how like query from which we use want to find spsecific data. As all we know that HBAse is noSQL so when we want to apply query or if we have a case like you then i think you should use Hive or PIG where as Hive is quiet good approach because in PIG we need to mess with scripts.
    Anyway you can get good guaidence about hive from here HIVE integration with HBase and from Here
    If yout only purpose is to view data not to get from code (of any client) then you can use HBase Explorer or a new and very good product but it is in its beta release is "HBase manager". You can get this from HBase Manager
    Its simple, and more importantly, it helps to insert and delete data, applying filters on column qualifiers from UI like other DBclients. Have a try.
    I hope it would be helpful for you :)

    0 讨论(0)
  • 2020-12-12 19:51

    An example of a text search for a value BIGBLUE in table t1 with column family of d:a_content. A scan of the table will show all the available values :-

    scan 't1'
    ...
    column=d:a_content, timestamp=1404399246216, value=BIGBLUE
    ...
    

    To search just for a value of BIGBLUE with limit of 1, try the below command :-

    scan 't1',{ COLUMNS => 'd:a_content', LIMIT => 1, FILTER => "ValueFilter( =, 'regexstring:BIGBLUE' )" }
    
    COLUMN+CELL
    column=d:a_content, timestamp=1404399246216, value=BIGBLUE
    

    Obviously removing the limit will show all occurrences in that table/cf.

    0 讨论(0)
  • 2020-12-12 19:55

    It is possible without Hive:

    scan 'filemetadata', 
         { COLUMNS => 'colFam:colQualifier', 
           LIMIT => 10, 
           FILTER => "ValueFilter( =, 'binaryprefix:<someValue.e.g. test1 AsDefinedInQuestion>' )" 
         }
    

    Note: in order to find all rows that contain test1 as value as specified in the question, use binaryprefix:test1 in the filter (see this answer for more examples)

    0 讨论(0)
  • 2020-12-12 19:55

    As there were multiple requests to explain this answer this additional answer has been posted.

    Example 1

    If

    scan '<table>', { COLUMNS => '<column>', LIMIT => 3 }
    

    would return:

    ROW     COLUMN+CELL
    ROW1    column=<column>, timestamp=<timestamp>, value=hello_value
    ROW2    column=<column>, timestamp=<timestamp>, value=hello_value2
    ROW3    column=<column>, timestamp=<timestamp>, value=hello_value3
    

    then this filter:

    scan '<table>', { COLUMNS => '<column>', LIMIT => 3, FILTER => "ValueFilter( =, 'binaryprefix:hello_value2') AND ValueFilter( =, 'binaryprefix:hello_value3')" }
    

    would return:

    ROW     COLUMN+CELL
    ROW2    column=<column>, timestamp=<timestamp>, value=hello_value2
    ROW3    column=<column>, timestamp=<timestamp>, value=hello_value3
    

    Example 2

    If not is supported as well:

    scan '<table>', { COLUMNS => '<column>', LIMIT => 3, FILTER => "ValueFilter( !=, 'binaryprefix:hello_value2' )" }
    

    would return:

    ROW     COLUMN+CELL
    ROW1    column=<column>, timestamp=<timestamp>, value=hello_value
    ROW3    column=<column>, timestamp=<timestamp>, value=hello_value3
    
    0 讨论(0)
  • 2020-12-12 20:02

    Nishu, here is solution I periodically use. It is actually much more powerful than you need right now but I think you will use it's power some day. Yes, it is for HBase shell.

    import org.apache.hadoop.hbase.filter.CompareFilter
    import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
    import org.apache.hadoop.hbase.filter.SubstringComparator
    import org.apache.hadoop.hbase.util.Bytes
    
    scan 'yourTable', {LIMIT => 10, FILTER => SingleColumnValueFilter.new(Bytes.toBytes('family'), Bytes.toBytes('field'), CompareFilter::CompareOp.valueOf('EQUAL'), Bytes.toBytes('AAA')), COLUMNS => 'family:field' }
    

    Only family:field column is returned with filter applied. This filter could be improved to perform more complicated comparisons.

    Here are also hints for you that I consider most useful:

    • http://hadoop-hbase.blogspot.com/2012/01/hbase-intra-row-scanning.html - Intra-row scanning explanation (Java API).
    • https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/filter/FilterBase.html - JavaDoc for FilterBase class with links to descendants which actually can be used the same style. OK, shell syntax will be slightly different but having example above you can use this.
    0 讨论(0)
  • 2020-12-12 20:05

    To scan a table in hbase on the basis of any column value, SingleColumnValueFilter can be used as :

    scan 'tablename' ,
       { 
         FILTER => "SingleColumnValueFilter('column_family','col_name',>, 'binary:1')" 
       } 
    
    0 讨论(0)
提交回复
热议问题