hbase | 易学教程

Insert rows into HBase from a Storm bolt

阅读更多关于 Insert rows into HBase from a Storm bolt

问题 I would like to be able to write new entries into HBase from a distributed (not local) Storm topology. There exist a few GitHub projects that provide either HBase Mappers or pre-made Storm bolts to write Tuples into HBase. These projects provide instructions for executing their samples on the LocalCluster. The problem that I am running into with both of these projects, and directly accessing the HBase API from the bolt, is that they all require the HBase-site.xml file to be included on the

Hadoop+HBase cluster on windows: winutils not found

阅读更多关于 Hadoop+HBase cluster on windows: winutils not found

问题 I'm trying to set up a fully-distributed 4-node dev cluster with Hadoop 2.20 and HBase 0.98 on Windows. I've built Hadoop on Windows successfully, and more recently, also build HBase on Windows. We have successfully ran the wordcount example from the Hadoop installation guide, as well as a custom WebHDFS job. As HBase fully-distributed on Windows isn't supported yet, I'm running HBase under cygwin. When trying to start hbase from my master (./bin/start-hbase.sh), I get the following error:

Writing to HBase in MapReduce using MultipleOutputs

阅读更多关于 Writing to HBase in MapReduce using MultipleOutputs

问题 I currently have a MapReduce job that uses MultipleOutputs to send data to several HDFS locations. After that completes, I am using HBase client calls (outside of MR) to add some of the same elements to a few HBase tables. It would be nice to add the HBase outputs as just additional MultipleOutputs, using TableOutputFormat. In that way, I would distribute my HBase processing. Problem is, I cannot get this to work. Has anyone ever used TableOutputFormat in MultipleOutputs...? With multiple

HBase介绍

阅读更多关于 HBase介绍

欢迎和大家交流技术相关问题：邮箱: jiangxinnju@163.com 博客园地址: http://www.cnblogs.com/jiangxinnju GitHub地址: https://github.com/jiangxincode 知乎地址: https://www.zhihu.com/people/jiangxinnju 转自： http://jiajun.iteye.com/blog/899632 原文图片丢失，本文补充图片，优化排版，修正部分错误。一、简介 History l started by chad walters and jim l 2006.11 G release paper on BigTable l 2007.2 inital HBase prototype created as Hadoop contrib l 2007.10 First useable Hbase l 2008.1 Hadoop become Apache top-level project and Hbase becomes subproject l 2008.10 Hbase 0.18,0.19 released Hbase是bigtable的开源山寨版本。是建立的HDFS之上，提供高可靠性、高性能、列存储、可伸缩、实时读写的数据库系统。

Unable to export a table from HBase

阅读更多关于 Unable to export a table from HBase

问题 I am unable to export a table from HBase into HDFS. Below is the error trace. It is quite of big size. Are there any other ways to export it? I used below command to export. I increase rpc timeout but still job failed. sudo -u hdfs hbase -Dhbase.rpc.timeout=1000000 org.apache.hadoop.hbase.mapreduce.Export My_Table /hdfs_path 15/05/05 08:50:27 INFO mapreduce.Job: map 0% reduce 0% 15/05/05 08:50:55 INFO mapreduce.Job: Task Id : attempt_1424936551928_0234_m_000001_0, Status : FAILED Error: org

Export data from HBase shell

阅读更多关于 Export data from HBase shell

问题 Im trying to export data from HBase Shell to a text file which I can parse, and add to a msysql db. I am currently using the following command: echo "scan 'registration',{COLUMNS=>'registration:status'}" | hbase shell > registration.txt which exports everything from the hbase shell to the registration.txt. How can I remove the shell intro, and the summary and just append the rows of data to the text file: Eg: Shell into I want to omit: HBase Shell; enter 'help<RETURN>' for list of supported

Hbase Client RPC Timeout

阅读更多关于 Hbase Client RPC Timeout

问题 I'm running Hbase 1.0.1/Hadoop 2.5.2. I'm trying to run a scan on a table but I'm getting RPC timeouts. I've changed the Hbase RPC timeout to 2 minutes which I can confirm frm the UI... <property> <name>hbase.rpc.timeout</name> <value>120000</value> <source>hbase-site.xml</source> </property> ... but my client is still timing out after 60s... Caused by: java.io.IOException: Call to xxxxxxx/172.16.5.13:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id

Hue整合HBase

阅读更多关于 Hue整合HBase

Hue安装关于Hue的编译和安装详见我的另一片博客： https://blog.csdn.net/DataIntel_XiAn/article/details/103543368 配置Hue [hbase] hbase_clusters=(Cluster|hadoop:9090) hbase_conf_dir=/home/hadoop/hbase/conf 启动测试首先应保证thrift服务启动（ hbase-daemon.sh start thrift ），查看HBase数据发现连接超时，原因不知，参考 hbase.regionserver.thrift.framed Description Use Thrift TFramedTransport on the server side. This is the recommended transport for thrift servers and requires a similar setting on the client side. Changing this to false will select the default transport, vulnerable to DoS when malformed requests are issued due to THRIFT-601. Default false

Enriching SparkContext without incurring in serialization issues

阅读更多关于 Enriching SparkContext without incurring in serialization issues

问题 I am trying to use Spark to process data that comes from HBase tables. This blog post gives an example of how to use NewHadoopAPI to read data from any Hadoop InputFormat . What I have done Since I will need to do this many times, I was trying to use implicits to enrich SparkContext , so that I can get an RDD from a given set of columns in HBase. I have written the following helper: trait HBaseReadSupport { implicit def toHBaseSC(sc: SparkContext) = new HBaseSC(sc) implicit def bytes2string

Should the HBase region server and Hadoop data node on the same machine?

阅读更多关于 Should the HBase region server and Hadoop data node on the same machine?

问题 Sorry that I don't have the resource to set up a cluster to test it, I'm just wondering to know: Can I deploy hbase region server on a separated machine other than the hadoop data node machine? I guess the answer is yes, but I'm not sure. Is it good or bad to deploy hbase region server and hadoop data node on different machines? When putting some data into hbase, where is this data eventually stored in, data node or region server? I guess it's data node, but what is the StoreFile and HFile in