Cloudera

YARN UNHEALTHY nodes

牧云@^-^@ 提交于 2019-12-09 17:16:31
问题 In our YARN cluster which is 80% full, we are seeing some of the yarn nodemanager's are marked as UNHEALTHY. after digging into logs I found its because disk space is 90% full for data dir. With following error 2015-02-21 08:33:51,590 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Node hdp009.abc.com:8041 reported UNHEALTHY with details: 4/4 local-dirs are bad: /data3/yarn/nm,/data2/yarn/nm,/data4/yarn/nm,/data1/yarn/nm; 2015-02-21 08:33:51,590 INFO org.apache.hadoop

Virtual machine “Cloudera quick start” not booting

你。 提交于 2019-12-09 16:30:50
问题 I have recently download "QuickStart VM" on http://www.cloudera.com (precisely, the version of virtualbox) This virtual machine use centOS (and my computer is a macbook air) I can not fully start this virtual machine(and I do not know why) I have attached a screenshot of the most advanced state of booting 回答1: I've discovered that when your screen appears to be frozen at that location, pressing [ESC] is apparently what you're supposed to do next. Mine was there, sitting there for a few

Install Hue without Cloudera

这一生的挚爱 提交于 2019-12-09 14:25:34
问题 Has anyone tried/succeeded in installing Hue on Hadoop without Cloudera? I have gotten to a point where I can reliably reproduce a hadoop cluster with hbase and hive and can set it all up in about 15 minutes. I'd love to have Hue along with all this without having to go back and redo my setup with Cloudera. 回答1: Checkout slides #19 & #5, Hue is getting everywhere and is compatible with Hadoop 0.20 / 1.2.0 / 2.2.0: http://gethue.com/hue-goes-to-paris-hug-france/ Hue has tarball releases

Unable to start CDH4 secondary name node: Invalid URI for NameNode address

…衆ロ難τιáo~ 提交于 2019-12-08 20:23:02
问题 I've been trying to setup a CDH4 installation of hadoop. I have 12 machines, labled hadoop01 - hadoop12, and the namenode, job tracker, and all data nodes have started fine. I'm able to view dfshealth.jsp and see that it's found all the data nodes. However, whenever I try to start the secondary name node it gives an exception: Starting Hadoop secondarynamenode: [ OK ] starting secondarynamenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-secondarynamenode-hadoop02.dev.terapeak.com.out

CDH4: Version conflict: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected

寵の児 提交于 2019-12-08 20:05:57
I'm trying to upgrade from CDH3 to CDH4 and am getting a version conflict from compile to run time. I'm getting this error: Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected From googling it seems that my code is being compiled against Hadoop 1.x and is running on Hadoop 2.0. I'm compiling and running the app on the same Hadoop client, so it should all be Hadoop 2.0. Here's what I get from running "hadoop version" on the client or any of the other nodes in this test cluster: Hadoop 2.0.0-cdh4.4.0

hadoop简介

馋奶兔 提交于 2019-12-08 19:01:47
转: http://baike.baidu.com/link?url=HwhPVuqqWelWIr0TeSBGPZ5SjoaYb5_Givp9-rJN-PYbSTMlwpECSKvjzLBzUE7hn9VvmhDoKb5NNCPw1pCsTa Hadoop 是一个由Apache基金会所开发的 分布式系统 基础架构。 用户可以在不了解分布式底层细节的情况下,开发分布式程序。充分利用集群的威力进行高速运算和存储。 [1] Hadoop实现了一个 分布式文件系统 (Hadoop Distributed File System),简称HDFS。HDFS有高 容错性 的特点,并且设计用来部署在低廉的(low-cost)硬件上;而且它提供高吞吐量(high throughput)来访问 应用程序 的数据,适合那些有着超大数据集(large data set)的应用程序。HDFS放宽了(relax)POSIX的要求,可以以流的形式访问(streaming access)文件系统中的数据。 Hadoop的框架最核心的设计就是:HDFS和MapReduce。HDFS为海量的数据提供了存储,则MapReduce为海量的数据提供了计算。 [2] 中文名 海杜普 外文名 Hadoop 类 别 电脑程序 全 称 Hadoop Distributed File System 目录 1 起源 ▪ 项目起源 ▪

大数据分析Hadoop及Python实现

断了今生、忘了曾经 提交于 2019-12-08 18:59:57
大数据 1.分布式: 主节点(Master)、从节点(Slaves) 2.集群(多台机器) 同时存储数据,并行处理数据 3.分布式计算 核心思想:分而治之思想 一.Hadoop 1.Apache Hadoop 介绍 : 对多个服务器中分布式并行处理数据的一种工具,可以无限的扩大数据规模,以此来解决大数据规模。 特点 ; 规模扩展性,灵活性,容错性和低成本。 功能 : Apache Hadoop是一个100%开源的框架,主要有两个功能: (1) 存储 大数据 (2) 处理 大数据 2.Hadoop关键模块 (1)HDFS 分布式存储海量数据, 将大数据文件分割为小的block(默认值128MB)文件进行存储的 (2)YARN 管理集群中资源(内存和CPU CORE)、分配资源给程序运行使用,比如MapReduce、Spark (3)MapReduce 分析海量数据框架 思想:分而治之的思想 将大数据文件分为很多小的数据文件,每个数据文件启用一个Map Task进行处理,完成以后启用一个Reduce Task合并所有的Map Task处理的结果。 3.Hadoop模块具体流程 (1)HDFS(数据存储) 分布式存储数据 ,将大数据文件划分为小数据文件 Block ,存储在集群中各个节点的硬盘中, 每个block有三个副本数,由Block统一管理。 (2)YARN(资源管理) YARN中

Livy Server: return a dataframe as JSON?

∥☆過路亽.° 提交于 2019-12-08 16:33:33
问题 I am executing a statement in Livy Server using HTTP POST call to localhost:8998/sessions/0/statements , with the following body { "code": "spark.sql(\"select * from test_table limit 10\")" } I would like an answer in the following format (...) "data": { "application/json": "[ {"id": "123", "init_date": 1481649345, ...}, {"id": "133", "init_date": 1481649333, ...}, {"id": "155", "init_date": 1481642153, ...}, ]" } (...) but what I'm getting is (...) "data": { "text/plain": "res0: org.apache

OOZIE : Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]

邮差的信 提交于 2019-12-08 11:19:27
问题 I'm trying to execute Oozie job with the help of URL: https://www.safaribooksonline.com/library/view/apache-oozie/9781449369910/ch05.html While executing oozie job -run -config target/example/job.properties Getting error as : Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]. Trying after 1 sec. Retry count = 1 Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]. Trying after 2 sec. Retry

Is Apache Knox Gateway compatible with Cloudera 4.5?

随声附和 提交于 2019-12-08 09:48:03
问题 I'm currently working on a future project with an Hadoop cluster. I need to find informations about security of the cluster. I found the API Apache Knox Gateway which seems to be what we need. We work with Cloudera 4.5 for now. In the future, we will upgrade to Cloudera 5. My problem is that Knox seems to not be compatible with Cloudera 4.5 (http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH-Version-and-Packaging-Information/cdhvd_topic_3.html). WebHDFS 2.4.0