Cloudera | 易学教程

YARN UNHEALTHY nodes

阅读更多关于 YARN UNHEALTHY nodes

问题 In our YARN cluster which is 80% full, we are seeing some of the yarn nodemanager's are marked as UNHEALTHY. after digging into logs I found its because disk space is 90% full for data dir. With following error 2015-02-21 08:33:51,590 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Node hdp009.abc.com:8041 reported UNHEALTHY with details: 4/4 local-dirs are bad: /data3/yarn/nm,/data2/yarn/nm,/data4/yarn/nm,/data1/yarn/nm; 2015-02-21 08:33:51,590 INFO org.apache.hadoop

Virtual machine “Cloudera quick start” not booting

阅读更多关于 Virtual machine “Cloudera quick start” not booting

问题 I have recently download "QuickStart VM" on http://www.cloudera.com (precisely, the version of virtualbox) This virtual machine use centOS (and my computer is a macbook air) I can not fully start this virtual machine(and I do not know why) I have attached a screenshot of the most advanced state of booting 回答1: I've discovered that when your screen appears to be frozen at that location, pressing [ESC] is apparently what you're supposed to do next. Mine was there, sitting there for a few

Install Hue without Cloudera

阅读更多关于 Install Hue without Cloudera

问题 Has anyone tried/succeeded in installing Hue on Hadoop without Cloudera? I have gotten to a point where I can reliably reproduce a hadoop cluster with hbase and hive and can set it all up in about 15 minutes. I'd love to have Hue along with all this without having to go back and redo my setup with Cloudera. 回答1: Checkout slides #19 & #5, Hue is getting everywhere and is compatible with Hadoop 0.20 / 1.2.0 / 2.2.0: http://gethue.com/hue-goes-to-paris-hug-france/ Hue has tarball releases

Unable to start CDH4 secondary name node: Invalid URI for NameNode address

阅读更多关于 Unable to start CDH4 secondary name node: Invalid URI for NameNode address

问题 I've been trying to setup a CDH4 installation of hadoop. I have 12 machines, labled hadoop01 - hadoop12, and the namenode, job tracker, and all data nodes have started fine. I'm able to view dfshealth.jsp and see that it's found all the data nodes. However, whenever I try to start the secondary name node it gives an exception: Starting Hadoop secondarynamenode: [ OK ] starting secondarynamenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-secondarynamenode-hadoop02.dev.terapeak.com.out

CDH4: Version conflict: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected

阅读更多关于 CDH4: Version conflict: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected

I'm trying to upgrade from CDH3 to CDH4 and am getting a version conflict from compile to run time. I'm getting this error: Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected From googling it seems that my code is being compiled against Hadoop 1.x and is running on Hadoop 2.0. I'm compiling and running the app on the same Hadoop client, so it should all be Hadoop 2.0. Here's what I get from running "hadoop version" on the client or any of the other nodes in this test cluster: Hadoop 2.0.0-cdh4.4.0

hadoop简介

阅读更多关于 hadoop简介

转： http://baike.baidu.com/link?url=HwhPVuqqWelWIr0TeSBGPZ5SjoaYb5_Givp9-rJN-PYbSTMlwpECSKvjzLBzUE7hn9VvmhDoKb5NNCPw1pCsTa Hadoop 是一个由Apache基金会所开发的分布式系统基础架构。用户可以在不了解分布式底层细节的情况下，开发分布式程序。充分利用集群的威力进行高速运算和存储。 [1] Hadoop实现了一个分布式文件系统（Hadoop Distributed File System），简称HDFS。HDFS有高容错性的特点，并且设计用来部署在低廉的（low-cost）硬件上；而且它提供高吞吐量（high throughput）来访问应用程序的数据，适合那些有着超大数据集（large data set）的应用程序。HDFS放宽了（relax）POSIX的要求，可以以流的形式访问（streaming access）文件系统中的数据。 Hadoop的框架最核心的设计就是：HDFS和MapReduce。HDFS为海量的数据提供了存储，则MapReduce为海量的数据提供了计算。 [2] 中文名海杜普外文名 Hadoop 类别电脑程序全称 Hadoop Distributed File System 目录 1 起源 ▪ 项目起源 ▪

大数据分析Hadoop及Python实现

阅读更多关于大数据分析Hadoop及Python实现

大数据 1.分布式：主节点（Master）、从节点（Slaves） 2.集群（多台机器）同时存储数据，并行处理数据 3.分布式计算核心思想：分而治之思想一.Hadoop 1.Apache Hadoop 介绍：对多个服务器中分布式并行处理数据的一种工具，可以无限的扩大数据规模，以此来解决大数据规模。特点 ; 规模扩展性，灵活性，容错性和低成本。功能： Apache Hadoop是一个100%开源的框架，主要有两个功能：（1）存储大数据（2）处理大数据 2.Hadoop关键模块（1）HDFS 分布式存储海量数据，将大数据文件分割为小的block（默认值128MB）文件进行存储的（2）YARN 管理集群中资源（内存和CPU CORE）、分配资源给程序运行使用，比如MapReduce、Spark （3）MapReduce 分析海量数据框架思想：分而治之的思想将大数据文件分为很多小的数据文件，每个数据文件启用一个Map Task进行处理，完成以后启用一个Reduce Task合并所有的Map Task处理的结果。 3.Hadoop模块具体流程（1）HDFS（数据存储）分布式存储数据，将大数据文件划分为小数据文件 Block ，存储在集群中各个节点的硬盘中，每个block有三个副本数，由Block统一管理。（2）YARN（资源管理） YARN中

Livy Server: return a dataframe as JSON?

阅读更多关于 Livy Server: return a dataframe as JSON?

问题 I am executing a statement in Livy Server using HTTP POST call to localhost:8998/sessions/0/statements , with the following body { "code": "spark.sql(\"select * from test_table limit 10\")" } I would like an answer in the following format (...) "data": { "application/json": "[ {"id": "123", "init_date": 1481649345, ...}, {"id": "133", "init_date": 1481649333, ...}, {"id": "155", "init_date": 1481642153, ...}, ]" } (...) but what I'm getting is (...) "data": { "text/plain": "res0: org.apache

OOZIE : Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]

阅读更多关于 OOZIE : Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]

问题 I'm trying to execute Oozie job with the help of URL: https://www.safaribooksonline.com/library/view/apache-oozie/9781449369910/ch05.html While executing oozie job -run -config target/example/job.properties Getting error as : Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]. Trying after 1 sec. Retry count = 1 Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]. Trying after 2 sec. Retry

Is Apache Knox Gateway compatible with Cloudera 4.5?

阅读更多关于 Is Apache Knox Gateway compatible with Cloudera 4.5?

问题 I'm currently working on a future project with an Hadoop cluster. I need to find informations about security of the cluster. I found the API Apache Knox Gateway which seems to be what we need. We work with Cloudera 4.5 for now. In the future, we will upgrade to Cloudera 5. My problem is that Knox seems to not be compatible with Cloudera 4.5 (http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH-Version-and-Packaging-Information/cdhvd_topic_3.html). WebHDFS 2.4.0