Cloudera | 易学教程

Hadoop YARN job is getting stucked at map 0% and reduce 0%

阅读更多关于 Hadoop YARN job is getting stucked at map 0% and reduce 0%

问题 I am trying to run a very simple job to test my hadoop setup so I tried with Word Count Example , which get stuck in 0% , so i tried some other simple jobs and each one of them stuck 52191_0003/ 14/07/14 23:55:51 INFO mapreduce.Job: Running job: job_1405376352191_0003 14/07/14 23:55:57 INFO mapreduce.Job: Job job_1405376352191_0003 running in uber mode : false 14/07/14 23:55:57 INFO mapreduce.Job: map 0% reduce 0% I am using hadoop version- Hadoop 2.3.0-cdh5.0.2 I did quick research on Google

fs.defaultFS only listens to localhost's port 8020

阅读更多关于 fs.defaultFS only listens to localhost's port 8020

问题 I have a CDH4.3 all-in-one vm up and running, i am trying to install a hadoop client remotely. I noticed that, without changing any default settings, my hadoop cluster is listening to 127.0.0.1:8020 . [cloudera@localhost ~]$ netstat -lent | grep 8020 tcp 0 0 127.0.0.1:8020 0.0.0.0:* LISTEN 492 100202 [cloudera@localhost ~]$ telnet ${all-in-one vm external IP} 8020 Trying ${all-in-one vm external IP}... telnet: connect to address ${all-in-one vm external IP} Connection refused [cloudera

HBase: /hbase/meta-region-server node does not exist

阅读更多关于 HBase: /hbase/meta-region-server node does not exist

问题 I have installed cloudera and hdfs, mapreduce, zookeper, hbase on it. 4 nodes with these services (3 zookeeper). All are installed by cloudera wizard and have no configuration issues in cloudera. On connect from java I have got an error: 9:32:23.020 [main-SendThread()] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server /172.20.7.6:2181 09:32:23.020 [main] INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper - Process identifier=hconnection-0x301abf87 connecting

Running any Hadoop command fails after enabling security.

阅读更多关于 Running any Hadoop command fails after enabling security.

问题 I was trying to enable Kerberos for my CDH 4.3 (via Cloudera Manager) test bed. So after changing authentication from Simple to Kerberos in the WebUI, I'm unable to do any hadoop operations as shown below. Is there anyway to specify the keytab explicitly? [root@host-dn15 ~]# su - hdfs -bash-4.1$ hdfs dfs -ls / 13/09/10 08:15:35 ERROR security.UserGroupInformation: PriviledgedActionException as:hdfs (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by

Comma delimited string to individual rows - Impala SQL

阅读更多关于 Comma delimited string to individual rows - Impala SQL

问题 Let's suppose we have a table: Owner | Pets ------------------------------ Jack | "dog, cat, crocodile" Mary | "bear, pig" I want to get as a result: Owner | Pets ------------------------------ Jack | "dog" Jack | "cat" Jack | "crocodile" Mary | "bear" Mary | "pig" I found some solutions to similar problems by googling, but Impala SQL does not offer any of these capabilities to apply the suggested solutions. Any help would be greatly appreciated! 回答1: The following works in Impala: split_part

Copy Solr HDFS Data to another Cluster

阅读更多关于 Copy Solr HDFS Data to another Cluster

问题 I have a solr cloud (v 4.10) installation that sits on top of Cloudera (CDH 5.4.2) HDFS with 3 solr instances each hosting a shard of each core. I am looking for a way to incrementally copy the solr data from our production cluster to our development cluster. There are 3 cores but I am only interested in copying one of them. I have tried to use the Solr replication - backup and restore but that doesn't seem to load anything into the dev cluster. http://host:8983/solr/core/replication?command

Sqoop job fails with KiteSDK validation error for Oracle import

阅读更多关于 Sqoop job fails with KiteSDK validation error for Oracle import

问题 I am attempting to run a Sqoop job to load from an Oracle db and into Parquet format to a Hadoop cluster. The job is incremental. Sqoop version is 1.4.6. Oracle version is 12c. Hadoop version is 2.6.0 (distro is Cloudera 5.5.1). The Sqoop command is (this creates the job, and executes it): $ sqoop job -fs hdfs://<HADOOPNAMENODE>:8020 \ --create myJob \ -- import \ --connect jdbc:oracle:thin:@<DBHOST>:<DBPORT>/<DBNAME> \ --username <USERNAME> \ -P \ --as-parquetfile \ --table <USERNAME>.

hadoop, python, subprocess failed with code 127

阅读更多关于 hadoop, python, subprocess failed with code 127

问题 I'm trying to run very simple task with mapreduce. mapper.py: #!/usr/bin/env python import sys for line in sys.stdin: print line my txt file: qwerty asdfgh zxc Command line to run the job: hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.8.0.jar \ -input /user/cloudera/In/test.txt \ -output /user/cloudera/test \ -mapper /home/cloudera/Documents/map.py \ -file /home/cloudera/Documents/map.py Error: INFO mapreduce.Job: Task Id : attempt_1490617885665

本地 vs 云：大数据厮杀的最终幸存者会是谁？— InfoQ专访阿里云智能通用计算平台负责人关涛

阅读更多关于本地 vs 云：大数据厮杀的最终幸存者会是谁？— InfoQ专访阿里云智能通用计算平台负责人关涛

摘要：本地大数据服务是否进入消失倒计时？云平台大数据服务最终到底会趋向多云、混合云还是单一公有云？集群规模增大，上云成本将难以承受是误区还是事实？InfoQ 将就上述问题对阿里云智能通用计算平台负责人关涛进行了专访。作者：赵钰莹原文标题本地 vs 云：大数据厮杀的最终幸存者会是谁？一家企业什么时候会决定上云？过去，这个问题的答案可能是当企业发现需要购买新的硬件进行新一轮资本投入时，往往倾向于考虑另一种替代方案，比如云，这可能更多还是从成本方面考虑；或者，当企业出现某种弹性计算需求时，云平台是非常好的实现 IT 资源“削峰”的方案。不同于现有技术边界的“替换”，如今，这个问题的答案可以再加上一条：技术边界的“扩张”。当企业需要某种能力，比如 AI 或者大数据，但自身技术实力达不到或者企业核心竞争力不在技术本身，此时就可能会考虑上云，甚至这已经成为不少企业选择云平台的重要原因。通过选择云平台，企业实现了自己技术边界的扩张，从而为业务边界扩张提供技术上的保障。过去几年，云平台大数据服务越来越成熟，单就这一项，主流云厂商可提供的服务列表就达到数十种，本地大数据服务的声音似乎越来越弱，这在 Cloudera 与 Hortonworks 合并之后尤为明显。有分析人士指出，Hadoop 与 Spark/Flink 等流技术的融合已经在云平台发生，这让 Cloudera 和

Where Mapper output in Hadoop is saved?

阅读更多关于 Where Mapper output in Hadoop is saved?

问题 I am interested in efficiently manage the Hadoop shuffling traffic and utilize the network bandwidth effectively. To do this I want to know how much shuffling traffic generated by each Datanodes ? Shuffling traffic is nothing but the output of mappers. So where this mapper output is saved ? How can i get the size of mapper output from each datanodes in a real time ? Appreciate your help. I have created a directory to store this mapper output as below. <property> <name>mapred.local.dir</name>