hbase | 易学教程

how to fetch all of data from hbase table in spark

阅读更多关于 how to fetch all of data from hbase table in spark

问题 I have a big table in hbase that name is UserAction, and it has three column families(song,album,singer). I need to fetch all of data from 'song' column family as a JavaRDD object. I try this code, but it's not efficient. Is there a better solution to do this ? static SparkConf sparkConf = new SparkConf().setAppName("test").setMaster( "local[4]"); static JavaSparkContext jsc = new JavaSparkContext(sparkConf); static void getRatings() { Configuration conf = HBaseConfiguration.create(); conf

Can OLAP be done in BigTable?

阅读更多关于 Can OLAP be done in BigTable?

问题 In the past I used to build WebAnalytics using OLAP cubes running on MySQL. Now an OLAP cube the way I used it is simply a large table (ok, it was stored a bit smarter than that) where each row is basically a measurement or and aggregated set of measurements. Each measurement has a bunch of dimensions (i.e. which pagename, useragent, ip, etc.) and a bunch of values (i.e. how many pageviews, how many visitors, etc.). The queries that you run on a table like this are usually of the form (meta

Insert Json into Hbase as JSON - Scala

阅读更多关于 Insert Json into Hbase as JSON - Scala

问题 I would like to insert a json object into a Hbase cellusing scala, presently i'm able to insert values using the below code but would like to know how i may be able to insert the entire Json object into a Hbase cell. import org.apache.hadoop.hbase.util.Bytes.toBytes val hTable:HTable = new HTable(configuration, "tablename") val p = new Put(Bytes.toBytes("row1")) p.add(Bytes.toBytes("info"),Bytes.toBytes("firstname)",Bytes.toBytes("Jim")) hTable.put(p) hTable.close() 回答1: You can encode your

Exclusion of dependency of spark-core in CDH

阅读更多关于 Exclusion of dependency of spark-core in CDH

问题 I'm using Structured Spark Streaming to write to HBase data coming from Kafka. My cluster distribution is : Hadoop 3.0.0-cdh6.2.0, and i'm using Spark 2.4.0 My code is like below : val df = spark .readStream .format("kafka") .option("kafka.bootstrap.servers", bootstrapServers) .option("subscribe", topic) .option("failOnDataLoss", false) .load() .selectExpr("CAST(key AS STRING)" , "CAST(value AS STRING)") .as(Encoders.STRING) df.writeStream .foreachBatch { (batchDF: Dataset[Row], batchId: Long

Stand alone HBase on local file system getting zookeeper error?

阅读更多关于 Stand alone HBase on local file system getting zookeeper error?

问题 Hi guys I am trying to follow the quick start of HBase and start an HBase on the local file system( without using HDFS ) However when I start the shell using ./hbase shell and type "status" I get zookeeper error ?! hbase(main):001:0> status 14/01/07 12:44:48 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries 14/01/07 12:44:48 WARN zookeeper.ZKUtil: hconnection Unable to set watcher on znode (/hbase/hbaseid) org.apache.zookeeper.KeeperException

How to store primitive data types in hbase and retrieve

阅读更多关于 How to store primitive data types in hbase and retrieve

问题 How can i store and retrieve primitive data types using hbase api.? My task is to save random events on hbase that contains unpredictable data types which are generated randomly. and need to retrieve them after whenever i want? can someone help me out this please. because i'm really new to hbase and this stuff. 回答1: This is how you put data into a HBase table : Configuration conf = HBaseConfiguration.create(); HTable table = new HTable(conf, "TABLE_NAME"); Put p = new Put(rowKey); p.add(Bytes

How to store primitive data types in hbase and retrieve

阅读更多关于 How to store primitive data types in hbase and retrieve

大数据面试题

阅读更多关于大数据面试题

第一部分选择题 1. 下面哪个程序负责 HDFS 数据存储。答案C DataNode a)NameNode b)Jobtracker c)DataNode d)secondaryNameNode e)tasktracker NameNode:负责调度,比如你需要存一个640m的文件如果按照64m分块那么namenode就会把这10个块（这里不考虑副本）分配到集群中的datanode上并记录对于关系。当你要下载这个文件的时候namenode就知道在哪些节点上给你取这些数据了。。。它主要维护两个map 一个是文件到块的对应关系一个是块到节点的对应关系。（文件分成哪些块，这些块分别在哪些节点） 2. HDfS 中的 block 默认保存几份？答案A默认3分 a)3 份 b)2 份 c)1 份 d)不确定 3. 下列哪个程序通常与 NameNode 在一个节点启动？答案D a)SecondaryNameNode b)DataNode c)TaskTracker d)Jobtracker 此题分析： hadoop的集群是基于master/slave模式，namenode和jobtracker属于master，datanode和tasktracker属于slave，master只有一个，而slave有多个SecondaryNameNode内存需求和NameNode在一个数量级上

Create indexes in solr on top of HBase

阅读更多关于 Create indexes in solr on top of HBase

问题 Is there anyway in which I can create indexes in Solr to perform full-text search from HBase for Near Real Time. I didn't wanted to store the whole text in my solr indexes. Made "stored=false" Note: - Keeping in mind, I am working on large datasets and want to do Near Real Time search. WE are talking TB/PB of data. UPDATED Cloudera Distribution : 5.4.x is used with Cloudera Search components. Solr : 4.10.x HBase : 1.0.x Indexer Service : Lily HBase Indexer with cloudera morphlines Is there

How to obtain Phoenix table data via HBase REST service

阅读更多关于 How to obtain Phoenix table data via HBase REST service

问题 I created a HBase table using the Phoenix JDBC Driver in the following code snippet: Class.forName("org.apache.phoenix.jdbc.PhoenixDriver"); Connection conn = DriverManager.getConnection("jdbc:phoenix:serverurl:/hbase-unsecure"); System.out.println("got connection"); conn.createStatement().execute("CREATE TABLE IF NOT EXISTS phoenixtest (id BIGINT not null primary key, test VARCHAR)"); int inserted = conn.createStatement().executeUpdate("UPSERT INTO phoenixtest VALUES (5, '13%')"); conn