hbase | 易学教程

Exporting HBase table to mysql

阅读更多关于 Exporting HBase table to mysql

问题 I am using hbase-0.90.6. I want to export the data from HBase to mysql. I know two-step process , first by running a mapreduce job to pull Hbase data into flat files, then exports flat file data into mysql. Is their any other tool which I can use to reduce this two-step to one. Or can we use sqoop to do the same in one step. Thanks. 回答1: I'm afraid that Sqoop do not support exports directly from HBase at the moment. Sqoop can help you in the two-step process with the second step - e.g. Sqoop

HBase:Region管理与Master工作机制

阅读更多关于 HBase:Region管理与Master工作机制

本篇博客小菌为大家带来的是HBase的Region管理与Master工作机制。 region 的管理首先让我们来看下region的管理,当然这存在一个前提: 任何时刻, 一个region只能分配给一个region server 。 1.master记录了当前有哪些可用的region server。以及当前哪些region分配给了哪些region server，哪些region还没有分配。 2.当需要分配的新的region，并且有一个region server上有可用空间时，master就给这个region server发送一个装载请求，把region分配给这个region server。 3.region server得到请求后，就开始对此region提供服务。 region server上线前提: master使用zookeeper来跟踪region server状态。 1.当某个region server启动时，首先在zookeeper上的/hbase/rs目录下建立代表自己的znode。 2.master订阅了/hbase/rs目录上的变更消息，当/hbase/rs目录下的文件出现新增或删除操作时，master可以得到来自zookeeper的实时通知。因此一旦region server上线，master能马上得到消息。 region server下线前提:

Hbase read performance varying abnormally

阅读更多关于 Hbase read performance varying abnormally

问题 I've installed HBase 0.94.0. I had to improve my read performance through scan. I've inserted random 100000 records. When I set setCache(100); my performance was 16 secs for 100000 records. When I set it to setCache(50) my performance was 90 secs for 100000 records. When I set it to setCache(10); my performance was 16 secs for 100000 records public class Test { public static void main(String[] args) { long start, middle, end; HTableDescriptor descriptor = new HTableDescriptor("Student7");

Hbase read performance varying abnormally

阅读更多关于 Hbase read performance varying abnormally

get count of hbase table based on dates

阅读更多关于 get count of hbase table based on dates

问题 What would be the easiest way to get a count of hbase table rows based on a time period using the inserted timestamp? I only have found using: hbase> count ‘t1’, INTERVAL => 100000 This does not solve my problem. There seems to be another option but I am getting 0 results? hbase> get 'hbase_output', '*', {TIMERANGE => [1445212800,1445299200]} COLUMN CELL 0 row(s) in 0.0900 seconds would this be the only two options to do this? I put the '*', for all rows in the table and thinking this may be

How to compare a string in Java with another string stored in hbase with case-insensitive?

阅读更多关于 How to compare a string in Java with another string stored in hbase with case-insensitive?

问题 I am new to HBase. I want to compare a string in Java with another string stored in hbase with case-insensitive. How can i achieve this ? Thanks in advance ... 回答1: You could try RegexStringComparator like RegexStringComparator regexStringComparator = new RegexStringComparator("^[aA][bB][cC]$");//will match aBc or ABC or Abc any case in order of a followed by b followed by c. SingleColumnValueFilter filter = new SingleColumnValueFilter( cf, column, CompareOp.EQUAL, regexStringComparator );

“java.io.IOException: Pass a Delete or a Put” when reading HDFS and storing HBase

阅读更多关于 “java.io.IOException: Pass a Delete or a Put” when reading HDFS and storing HBase

问题 I has been crazy with this error in a week. There was a post with the same problem Pass a Delete or a Put error in hbase mapreduce. But that resolution's not really work on mine. My Driver: Configuration conf = HBaseConfiguration.create(); Job job; try { job = new Job(conf, "Training"); job.setJarByClass(TrainingDriver.class); job.setMapperClass(TrainingMapper.class); job.setMapOutputKeyClass(LongWritable.class); job.setMapOutputValueClass(Text.class); FileInputFormat.setInputPaths(job, new

HBase的读写流程

阅读更多关于 HBase的读写流程

本篇博客小菌为大家带来的是关于HBase的读写路程的介绍。读请求流程在介绍之前先为大家科普几个前提! 什么是meta表? meta 表时hbase系统自带的一个表。里面存储了hbase用户表的原信息。什么是元信息? meta表内记录一行数据是用户表一个region的start key 到endkey的范围。 meta表存在什么地方？ meta表存储在regionserver里。具体存储在哪个regionserver里？zookeeper知道。好了,清楚了上面的概念之后,理解起来就会简单很多了。 meta，region之间的关系如下(在HBase0.96版本中已经取消了root表) 具体的流程如下: 1.到zookeeper询问meta表在哪 2.到meta所在的节点（regionserver）读取meta表的数据 3.找到region 获取region和regionserver的对应关系，直接到regionserver读取region数据写请求过程写入的流程与读的流程稍复杂一些 1、Client先访问zookeeper，找到Meta表，并获取Meta表元数据。确定当前将要写入的数据所对应的HRegion和HRegionServer服务器。 2、Client向该HRegionServer服务器发起写入数据请求。 Client先把数据写入到HLog，以防止数据丢失。

In Spark sc.newAPIHadoopRDD is reading 2.7 GB data the with 5 partitions

阅读更多关于 In Spark sc.newAPIHadoopRDD is reading 2.7 GB data the with 5 partitions

问题 I am using spark 1.4 and I am trying to read the data from Hbase by using sc.newAPIHadoopRDD to read 2.7 GB data but there are 5 task are created for this stage and taking 2 t0 3 minutes to process it. Can anyone let me know how to increase the more partitions to read the data fast ? 回答1: org.apache.hadoop.hbase.mapreduce.TableInputFormat creates a partition for each region. Your table seems to be split into 5 regions. Pre-splitting your table should increase the number of partitions (have a

How can I skip HBase rows that are missing specific columns?

阅读更多关于 How can I skip HBase rows that are missing specific columns?

问题 I'm writing a mapreduce job over HBase using table mapper. I want to skip rows that don't have specific columns. For example, if the mapper reads from the "meta" family, "source" qualifier column, the mapper should expect something to be in that column. I know I can add columns to the scan object, but I expect this merely limits which rows can be seen by the scan, not which columns need to be there. What filter can I use to skip rows without the columns I need? Also, the filter concept itself