hbase

Joining two ResultSets from HBase in Java?

这一生的挚爱 提交于 2019-12-10 18:08:45
问题 Is it possible to join two or more resultsets which are retrieved from hbase in java? 回答1: No it's not possible to join JDBC result sets. However you can get their results and manually combine them if they are compatible. (if they are of the same entity). EDIT : If you simply need to combine two lists of the same type you can do list1.addAll(list2); 来源: https://stackoverflow.com/questions/16767188/joining-two-resultsets-from-hbase-in-java

ODBC driver for HBase/Phoenix

不打扰是莪最后的温柔 提交于 2019-12-10 18:06:20
问题 I need to connect Tableau to HBase or Phoenix and Tableau does not support JDBC. Bummer! I've read about the proprietary Simba driver but haven't seen any reports of people using it. I don't feel like forking over money when it's not ideal, and my employer feels the same way. Is there another way to connect Tableau to HBase or Phoenix? How are other people doing it? I don't like the idea of using Hive to connect to HBase because one of the main reasons to go away from Hive is its atrocious

Starting HBASE, java.lang.ClassNotFoundException: org.apache.htrace.SamplerBuilder

只愿长相守 提交于 2019-12-10 14:22:59
问题 I am trying to start HBASE with start-hbase.sh, however, I get the error: java.lang.ClassNotFoundException: org.apache.htrace.SamplerBuilder. I have tried to add various .jar 's to various folders (as suggested in other threads) but nothing works. I am using Hadoop 3.11 and HBase 2.10 Here is the (end of the) error log. java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster. at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java

How to copy Hbase data to local file system (external drive)

泄露秘密 提交于 2019-12-10 11:53:34
问题 I want to have a backup of hbase data that is in hdfs. I have an external drive ( usb hard disk). How can i copy data from hbase to my drive. I have used a command like bin/hbase org.apache.hadoop.mapreduce.Drive export table /media/.../mydrive. but what actually is done, that a new directory in hdfs with path /media/.../mydrive is created and nothing is save in my external hard. Why is this issue. Is there a way to inform that data should be saved in my external driver other than the command

Deleting Columns in HBase

不羁的心 提交于 2019-12-10 11:53:09
问题 In HBase, calling DeleteColumn() method i.e., essentially a schema change to a column family or deleting column families will result in downtime of HBase Cluster? 回答1: The deleteColumn method on a Delete mutation of HBase deletes specific column(s) from a specific row this is not a schema change since HBase does not retain a schema-level knowledge of columns of each row (and each row can have a different number and types of columns - think about it as a thinly populated matrix). The same is

NoSuchMethodError HTableDescriptor.addFamily

℡╲_俬逩灬. 提交于 2019-12-10 11:35:42
问题 I have installed hadoop 2.5.2 and hbase 1.0.1.1 (which are compatible with each other) .But In the hadoop code I am trying to add columnfamily in the hbase table. My code is Configuration hbaseConfiguration = HBaseConfiguration.create(); Job hbaseImportJob = new Job(hbaseConfiguration, "FileToHBase"); HBaseAdmin hbaseAdmin = new HBaseAdmin(hbaseConfiguration); if (!hbaseAdmin.tableExists(Config_values.tableName)) { TableName tableName1 = TableName.valueOf("tableName"); HTableDescriptor

Get Hbase region size via API

人盡茶涼 提交于 2019-12-10 11:23:27
问题 I am trying to write a balancer tool for Hbase which could balance regions across regionServers for a table by region count and/or region size (sum of storeFile sizes). I could not find any Hbase API class which returns the regions size or related info. I have already checked a few of the classes which could be used to get other table/region info, e.g. org.apache.hadoop.hbase.client.HTable and HBaseAdmin. I am thinking, another way this could be implemented is by using one of the Hadoop

HBase Scan TimeRange Does not Work in Scala

放肆的年华 提交于 2019-12-10 10:39:44
问题 I write scala code to retrieve data based on its time range. Here're my code : object Hbase_Scan_TimeRange { def main(args: Array[String]): Unit = { //===Basic Hbase (Non Deprecated)===Start Logger.getLogger(this.getClass) Logger.getLogger("org").setLevel(Level.ERROR) BasicConfigurator.configure() val conf = HBaseConfiguration.create() val connection = ConnectionFactory.createConnection(conf) val admin = connection.getAdmin() //===Basic Hbase (Non Deprecated)===End val scan = new Scan() val

RDD is having only first column value : Hbase, PySpark

佐手、 提交于 2019-12-10 10:36:01
问题 We are reading a Hbase table with Pyspark using the following commands. from pyspark.sql.types import * host=<Host Name> port=<Port Number> keyConv = "org.apache.spark.examples.pythonconverters.ImmutableBytesWritableToStringConverter" valueConv = "org.apache.spark.examples.pythonconverters.HBaseResultToStringConverter" cmdata_conf = {"hbase.zookeeper.property.clientPort":port, "hbase.zookeeper.quorum": host, "hbase.mapreduce.inputtable": "CMData", "hbase.mapreduce.scan.columns": "info:Tenure

hbase 错误调用表读方法引发的血案

冷暖自知 提交于 2019-12-10 09:49:49
记一次错误调用hbase读方法引发的血案 需求说明 目前公司的数据库存在上前亿级别的GPS坐标点,数据量在几十至百TB级别,这些坐标需要获取从百度、高德等网站上更新获取该点对应的服务信息,即一个坐标点对应一条该坐标的描述信息。如果将这些坐标全部按一个点一个 点 的查询下载,按照目前我们的查询Http接口带宽限制,至少需要1年多。通过抽样调研后发现,这些坐标里存在有大量的重复数据,重复率接近80%,若采用缓存的方式,存储这些坐标描述信息,当重复坐标出现时,可以利用数据库的查询提高获取坐标的描述信息,同时节省大量的带宽,考虑到数据库将存储上TB级别数据,自然想到hbase。 方案 利用hbase存储更新后的gps坐标点是不错的选择。目前集群有11台regionServer服务器,当读写hbase时发现最高读数据接近50W 请求/s ,这对于重复数据的gps坐标点直接读取hbase更新,将省掉大量的资源和时间。 架构1 考虑到未来除了采用 hbase做缓冲存储坐标点外还可以用来做其他信息的存储缓冲,于是我们提出了如下第一种架构方案 我们将我们的缓存模块单独提取出来做成限速服务,限速服务负责这个HTTP的多线程并发查询网络资源和以及hbase的读写。这样做的优点是具体的业务只需要负责向缓存模块发送http请求,并不需要关心缓存模块内部的实现细节,这样充分实现了业务间的解耦