Cloudera | 易学教程

beeline not able to connect to hiveserver2

阅读更多关于 beeline not able to connect to hiveserver2

问题 I have a CDH 5.3 instance. I start the hive-server2 by first starting the hive-metastore and then the hive-server from command line. After this I use beeline to connect to my hive-server2 but apparently it is not able to so. Could not open connection to jdbc:hive2://localhost:10000: java.net.ConnectException: Connection refused (state=08S01,code=0) Another issue, I tried to see if the hive-server2 was listening on port 10000. I did " sudo netstat -tulpn | grep :10000 " but none of the

beeline not able to connect to hiveserver2

阅读更多关于 beeline not able to connect to hiveserver2

Maven - Different Dependency Version in Test

阅读更多关于 Maven - Different Dependency Version in Test

问题 I'm suffering an issue similar to Maven 2 - different dependency versions in test and compile but the specified answer there does not work. In my project I need to depend on a Cloudera distribution of Hadoop and a 'vanilla' version for JUnit testing, as the former only works on *nix. When I try and execute my application, I get Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration . When I run JUnit tests from Maven or Eclipse, everything works fine.

大数据面试题

阅读更多关于大数据面试题

第一部分选择题 1. 下面哪个程序负责 HDFS 数据存储。答案C DataNode a)NameNode b)Jobtracker c)DataNode d)secondaryNameNode e)tasktracker NameNode:负责调度,比如你需要存一个640m的文件如果按照64m分块那么namenode就会把这10个块（这里不考虑副本）分配到集群中的datanode上并记录对于关系。当你要下载这个文件的时候namenode就知道在哪些节点上给你取这些数据了。。。它主要维护两个map 一个是文件到块的对应关系一个是块到节点的对应关系。（文件分成哪些块，这些块分别在哪些节点） 2. HDfS 中的 block 默认保存几份？答案A默认3分 a)3 份 b)2 份 c)1 份 d)不确定 3. 下列哪个程序通常与 NameNode 在一个节点启动？答案D a)SecondaryNameNode b)DataNode c)TaskTracker d)Jobtracker 此题分析： hadoop的集群是基于master/slave模式，namenode和jobtracker属于master，datanode和tasktracker属于slave，master只有一个，而slave有多个SecondaryNameNode内存需求和NameNode在一个数量级上

Create indexes in solr on top of HBase

阅读更多关于 Create indexes in solr on top of HBase

问题 Is there anyway in which I can create indexes in Solr to perform full-text search from HBase for Near Real Time. I didn't wanted to store the whole text in my solr indexes. Made "stored=false" Note: - Keeping in mind, I am working on large datasets and want to do Near Real Time search. WE are talking TB/PB of data. UPDATED Cloudera Distribution : 5.4.x is used with Cloudera Search components. Solr : 4.10.x HBase : 1.0.x Indexer Service : Lily HBase Indexer with cloudera morphlines Is there

Running java hadoop job on local/remote cluster

阅读更多关于 Running java hadoop job on local/remote cluster

问题 I'm trying to run hadoop job on local/remote cluster. This job in future will be executed from web application. I'm trying to execute this piece of code from eclipse: public class TestHadoop { private final static String host = "localhost"; public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException { run(); } static void run() throws IOException, InterruptedException, ClassNotFoundException { Configuration conf = new Configuration(); // run on

Unable to start daemons using start-dfs.sh

阅读更多关于 Unable to start daemons using start-dfs.sh

问题 We are using cdh4-0.0 distribution from cloudera. We are unable to start the daemons using the below command. >start-dfs.sh Starting namenodes on [localhost] hduser@localhost's password: localhost: mkdir: cannot create directory `/hduser': Permission denied localhost: chown: cannot access `/hduser/hduser': No such file or directory localhost: starting namenode, logging to /hduser/hduser/hadoop-hduser-namenode-canberra.out localhost: /home/hduser/work/software/cloudera/hadoop-2.0.0-cdh4.0.0

hdfs data node disconnected from namenode

阅读更多关于 hdfs data node disconnected from namenode

问题 I get from time to time the following errors in cloudera manager: This DataNode is not connected to one or more of its NameNode(s). and The Cloudera Manager agent got an unexpected response from this role's web server. (usually together, sometimes only one of them) In most references to these errors in SO and Google, the issue is a configuration problem (and the data node never connects to the name node) In my case the data nodes usually connect at start up, but loose the connection after

Searching over documents stored in Hadoop - which tool to use?

阅读更多关于 Searching over documents stored in Hadoop - which tool to use?

问题 I'm lost in: Hadoop, Hbase, Lucene, Carrot2, Cloudera, Tika, ZooKeeper, Solr, Katta, Cascading, POI... When you read about the one you can be often sure that each of the others tools is going to be mentioned. I don't expect you to explain every tool to me - sure not. If you could help me to narrow this set for my particular scenario it would be great. So far I'm not sure which of the above will fit and it looks like (as always) there are more then one way of doing what's to be done. The

hadoop - map reduce task and static variable

阅读更多关于 hadoop - map reduce task and static variable

问题 I just started working on some hadoop/hbase MapReduce job (using cloudera) and I have the following question : Let's say, we have a java class with a main and a static viariable. That class define inner class corresponding to the Mapper and Reducer tasks. Before lauching the job, the main initialize the static variable. This variable is read in the Mapper class. The class is then launched using 'hadoop jar' on a cluster. My question: I don't see how Map and Reduce tasks on other nodes can see