hbase | 易学教程

Accessing HBase running in VM with a client on host system

阅读更多关于 Accessing HBase running in VM with a client on host system

问题 I try to write some data to hbase with a client program HBase @ Hadoop runs in a preconfigured VM from Cloudera @ ubuntu. The Client runs on the system hosting the VM and running the client directly in the VM works. So now I want to use the client outside the vm to access the servers on the vm I'm using NAT. To be able to access the servers like HBase Master, HUE..running on the vm I configured port forwarding in virtual box: Thus I can reach the overview sites of the HBase Master, HUE.. To

HBase row key design for monotonically increasing keys

阅读更多关于 HBase row key design for monotonically increasing keys

问题 I've an HBase table where I'm writing the row keys like: <prefix>~1 <prefix>~2 <prefix>~3 ... <prefix>~9 <prefix>~10 The scan on the HBase shell gives an output: <prefix>~1 <prefix>~10 <prefix>~2 <prefix>~3 ... <prefix>~9 How should a row key be designed so that the row with key <prefix>~10 comes last? I'm looking for some recommended ways or the ways that are more popular for designing HBase row keys. 回答1: How should a row key be designed so that the row with key ~10 comes last? You see the

sqoop1.4.7安装和操作

阅读更多关于 sqoop1.4.7安装和操作

前提概述官网 http://sqoop.apache.org/ 将来sqoop在使用的时候有可能会跟那些系统或者组件打交道？ HDFS， MapReduce， YARN， ZooKeeper， Hive， HBase， MySQL sqoop1就是一个工具，只需要在一个节点上进行安装即可。如果sqoop要跟hive交互，那么sqoop节点机器一定要有hive系统。版本sqoop1和sqoop2 这里使用sqoop1.4.7（CDH6.3里集成的也是这个版本）下载安装跟其他apache大数据组件一样的套路，下载解压，配置conf，启动。下载地址： http://www.apache.org/dyn/closer.lua/sqoop/1.4.7 [ admin@centos7x3 sqoop ] $ vim conf/sqoop-env.sh #Set path to where bin/hadoop is available export HADOOP_COMMON_HOME = /opt/software/hadoop #Set path to where hadoop-*-core.jar is available export HADOOP_MAPRED_HOME = /opt/software/hadoop #set the path to where bin

Hive 整合Hbase

阅读更多关于 Hive 整合Hbase

摘要 Hive提供了与HBase的集成，使得能够在HBase表上使用HQL语句进行查询插入操作以及进行Join和Union等复杂查询、同时也可以将hive表中的数据映射到Hbase中。应用场景 2.1 将ETL操作的数据存入HBase 2.2 HBase作为Hive的数据源 2.3 构建低延时的数据仓库环境准备 3.1 hive与hbase整合环境配置修改hive-site.xml文件，添加配置属性（zookeeper的地址） [root @hadoop01 conf] # vim hive-site.xml <property> <name> hbase.zookeeper.quorum </name> <value> node1:2181,node2:2181,node3:2181 </value> </property> 引入hbase的依赖包将hbase安装目录下的lib文件夹下的包导入到hive的环境变量中，在 hive-env.sh 文件中添加 [root @hadoop01 conf] # vim hive-env.sh export HIVE_CONF_DIR=/usr/local/hive/conf export HIVE_CLASSPATH= $HIVE_CLASSPATH : $HBASE_HOME /lib/* 至此

HBase - What's the difference between WAL and MemStore?

阅读更多关于 HBase - What's the difference between WAL and MemStore?

问题 I am trying to understand the HBase architecture. I can see two different terms are used for same purpose. Write Ahead Logs and Memstore , both are used to store new data that hasn't yet been persisted to permanent storage . What's the difference between WAL and MemStore? Update: WAL - is used to recover not-yet-persisted data in case a server crashes. MemStore - stores updates in memory as Sorted Keyvalue. It seems lot of duplication of data before writing the data to Disk. 回答1: WAL is for

HBase -- javaAPI 基础篇（创建hbase表，添加数据，查询）

阅读更多关于 HBase -- javaAPI 基础篇（创建hbase表，添加数据，查询）

pom文件配置： <repositories> <repository> <id>cloudera</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> </repository> </repositories> <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.6.0-mr1-cdh5.14.0</version> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifactId> <version>1.2.0-cdh5.14.0</version> </dependency> <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-server</artifactId> <version>1.2.0-cdh5.14.0</version> </dependency> <dependency>

Hbase Java API: Retrieving all rows that match a Partial Row Key

阅读更多关于 Hbase Java API: Retrieving all rows that match a Partial Row Key

问题 In the Python module happybase, I can retrieve all rows that have a row key starting with a given string (i.e, search using a partial row key). Let's say I have a rowkey in the format of (ID|TYPE|DATE), I would be able to find all rows with an ID of 1 and a TYPE of A by: import happybase connection = happybase.Connection('hmaster-host.com') table = connection.table('table_name') for key, data in table.scan(row_prefix="1|A|"): print key, data This is what I have so far as a totally client side

Tuning Hive Queries That Uses Underlying HBase Table

阅读更多关于 Tuning Hive Queries That Uses Underlying HBase Table

问题 I've got a table in Hbase let's say "tbl" and I would like to query it using Hive. Therefore I mapped a table to hive as follows: CREATE EXTERNAL TABLE tbl(id string, data map<string,string>) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,data:") TBLPROPERTIES("hbase.table.name" = "tbl"); Queries like: select * from tbl", "select id from tbl", "select id, data from tbl are really fast. But queries like select id from tbl

How does HBase enable Random Access to HDFS?

阅读更多关于 How does HBase enable Random Access to HDFS?

问题 Given that HBase is a database with its files stored in HDFS, how does it enable random access to a singular piece of data within HDFS? By which method is this accomplished? From the Apache HBase Reference Guide: HBase internally puts your data in indexed "StoreFiles" that exist on HDFS for high-speed lookups. See the Chapter 5, Data Model and the rest of this chapter for more information on how HBase achieves its goals. Scanning both chapters didn't reveal a high-level answer for this

HBase工程师线上工作经验总结----HBase常见问题及分析

阅读更多关于 HBase工程师线上工作经验总结----HBase常见问题及分析

【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 阅读本文可以带着下面问题： 1.HBase遇到问题，可以从几方面解决问题？ 2.HBase个别请求为什么很慢？你认为是什么原因？ 3. 客户端读写请求为什么大量出错？该从哪方面来分析？ 4.大量服务端exception，一般原因是什么？ 5.系统越来越慢的原因是什么？ 6.Hbase数据写进去，为什么会没有了，可能的原因是什么？ 7. regionserver发生abort,遇到最多是什么情况？ 8.从哪些方面可以判断HBase集群是否健康？ 9.为了加强HBase的安全性，你会采取哪些措施？在Tcon分布式系统测试实践的分享中，笔者提到了测试人员参与线上问题分析的必要性： 1、测试工作中的问题定位提供了大量经验，可以直接应用于线上。 2、快速的解决问题可以避免大故障的发生。 3、从线上的问题可以帮助我们准确抓住测试的重点和不足。因此在日常的线上维护工作中，积累和很多HBase的问题分析经验，这里于大家分享一下，如有错误和不足请指出。问题分析的主要手段 1、监控系统：首先用于判断系统各项指标是否正常，明确系统目前状况 2、服务端日志：查看例如region移动轨迹，发生了什么动作，服务端接受处理了哪些客户端请求。 3、gc日志：gc情况是否正常 4、操作系统日志和命令：操作系统层面、硬件是否故障