hbase

Count number of records in a column family in an HBase table

喜你入骨 提交于 2019-12-23 14:02:13
问题 I'm looking for an HBase shell command that will count the number of records in a specified column family. I know I can run: echo "scan 'table_name'" | hbase shell | grep column_family_name | wc -l however this will run much slower than the standard counting command: count 'table_name' , CACHE => 50000 (because the use of the CACHE=>50000) and worse - it doesn't return the real number of records, but something like the total number of cells (if I'm not mistaken?) in the specified column

Count number of records in a column family in an HBase table

蓝咒 提交于 2019-12-23 14:02:01
问题 I'm looking for an HBase shell command that will count the number of records in a specified column family. I know I can run: echo "scan 'table_name'" | hbase shell | grep column_family_name | wc -l however this will run much slower than the standard counting command: count 'table_name' , CACHE => 50000 (because the use of the CACHE=>50000) and worse - it doesn't return the real number of records, but something like the total number of cells (if I'm not mistaken?) in the specified column

thrift hbase client - support filters and coprocessors

走远了吗. 提交于 2019-12-23 13:05:32
问题 Sadly, My hbase client language is Python, I am using happybase for now which is based on thrift AFAIK. I know thrift so far is still not supporting filters, coprocessors (correct me if I am wrong here). Can some one point me any Jira items I can track the plan/progress if there is one? The only ones I can find is from "Hbase in Action": “Thrift server to match the new Java API”: https://issues.apache.org/jira/browse/HBASE-1744 “Make Endpoint Coprocessors Available from Thrift”: https:/

Encrypt HBase at-rest data in Cloud

╄→гoц情女王★ 提交于 2019-12-23 12:14:45
问题 I am pretty new to HBase and have been assigned a task to move our infrastructure to cloud. Our HBase data contains some customer information and hence needs to be encrypted while at-rest. I am already reading this: Transparent Encryption of Data At Rest (http://hbase.apache.org/book/ch08s03.html#hbase.encryption.server) It looks like a good solution except the fact that we have to store the password as plain text on each node. Is there a way to avoid this? Like store the password at just one

How to get the region in HBASE which is struck in FAILED_OPEN state?

我是研究僧i 提交于 2019-12-23 12:07:49
问题 Hbase hbck runs successfully and there is no inconsistency, but out of three regions which struck in transition state ( 2 out of 3 is in CLOSED state and 1 is in FAILED_OPEN) state. ( all three regions are part of one single Table) Since HBASE is in consistent state , there is no issue in Hbase operation, but I am not able to run balancer since regions struck in Transition state. How to remove/move these regions out of transition. I tried below command before posting this question. hbase hbck

Why exported HBase table is 4 times bigger than its original?

孤者浪人 提交于 2019-12-23 12:01:22
问题 I need to backup HBase table before update to a newer version. I decided to export table to hdfs with standard Export tool and then move it to local file system. For some reason exported table is 4 times larger than original one: hdfs dfs -du -h 1.4T backup-my-table hdfs dfs -du -h /hbase/data/default/ 417G my-table What can be the reason? Is it somehow related to compression? P.S. Maybe the way I made the backup matters. First I made a snapshot from target table, then cloned it to a copy

使用BulkLoad从HDFS批量导入数据到HBase

老子叫甜甜 提交于 2019-12-23 11:46:32
在向Hbase中写入数据时,常见的写入方法有使用HBase API,Mapreduce批量导入数据,使用这些方式带入数据时,一条数据写入到HBase数据库中的大致流程如图。 数据发出后首先写入到雨鞋日志WAl中,写入到预写日志中之后,随后写入到内存MemStore中,最后在Flush到Hfile中。这样写数据的方式不会导致数据的丢失,并且道正数据的有序性,但是当遇到大量的数据写入时,写入的速度就难以保证。所以,介绍一种性能更高的写入方式BulkLoad。 使用BulkLoad批量写入数据主要分为两部分: 一、使用HFileOutputFormat2通过自己编写的MapReduce作业将HFile写入到HDFS目录,由于写入到HBase中的数据是按照顺序排序的,HFileOutputFormat2中的configureIncrementalLoad()可以完成所需的配置。 二、将Hfile从HDFS移动到HBase表中,大致过程如图 实例代码pom依赖: <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-server</artifactId> <version>1.4.0</version> </dependency> <dependency> <groupId>org.apache.hadoop<

创建HBase表出现 "xxxxx is disabled."

一笑奈何 提交于 2019-12-23 10:08:39
用hbase shell 创建表的时候出现:“SearchCount is disabled” hbase ( main ) :002:0> count 'SearchCount' ERROR: org . apache . hadoop . hbase . DoNotRetryIOException: SearchCount is disabled . Here is some help for this command: Count the number of rows in a table . This operation may take a LONG time ( Run '$HADOOP_HOME/bin/hadoop jar hbase.jar rowcount' to run a counting mapreduce job ) . Current count is shown every 1000 rows by default . Count interval may be optionally specified . Scan caching is enabled on count scans by default . Default cache size is 10 rows . If your rows are small in size , you

HBase的HRegionServer进程无法正常启动(java.lang.RuntimeException: HRegionServer Aborted)

冷暖自知 提交于 2019-12-23 09:45:58
错误描述: HBase集群启动后,从节点的HRegionServer无法正常启动 错误发生原因: 集群时间不同步 解决步骤: 启动时,查看异常发生节点的HBase的启动日志 发现异常信息为:java.lang.RuntimeException: HRegionServer Aborted 集群时间不同步导致 无异常节点: 异常节点: 同步集群时间 三台节点执行以下命令 nptdate ntp4.aliyun.com 或: 使用crontab定时任务将n台无法联网的服务器与一台可以联网的服务器同步 重启HBase集群 来源: CSDN 作者: 辛Lay 链接: https://blog.csdn.net/weixin_38097878/article/details/103659664

importing data from sql server to hbase

一笑奈何 提交于 2019-12-23 05:17:08
问题 I know that Sqoop allows us to import data from a RDBMS into HDFS. I was wondering if the sql server connector in sqoop also allows us to import it directly into HBase? I know we can do this with mysql. I was wondering if the same can be done with sql server too 回答1: I am working in the Hortonworks Sandbox, and I was able to pull data from a SQL Server instance into an HBase table by doing the following steps: Get the SQL Server JDBC driver onto the Hadoop box. curl -L 'http://download