Cloudera

Is Hadoop in Docker container faster/worth it? [closed]

感情迁移 提交于 2019-12-06 04:06:07
Closed . This question is opinion-based . It is not currently accepting answers. Want to improve this question? Update the question so it can be answered with facts and citations by editing this post . Closed 3 years ago . I have a Hadoop based environment. I use Flume , Hue and Cassandra in this system. There is a big hype around Docker nowadays, so would like to examine, what are pros and cons in dockerization in this case. I think it should be much more portable, but it can be set using Cloudera Manager with a few clicks. Is it maybe faster or why is worth it? What are advantages? Maybe

Download file weekly from FTP to HDFS

瘦欲@ 提交于 2019-12-06 03:05:50
问题 I want to automate the weekly download of a file from an ftp server into a CDH5 hadoop cluster. What would be the best way to do this? I was thinking about an Oozie coordinator job but I can't think of a good method to download the file. 回答1: Since you're using CDH5, it's worth noting that the NFSv3 interface to HDFS is included in that Hadoop distribution. You should check for "Configuring an NFSv3 Gateway" in the CDH5 Installation Guide documentation. Once that's done, you could use wget,

Cloudera Impala:基于Hadoop的实时查询开源项目

狂风中的少年 提交于 2019-12-06 03:02:26
正在纽约进行的大数据技术会议 Strata Conference + Hadoop World 传来消息, Cloudera 发布了实时查询开源项目 Impala 1.0 beta版 ,称比原来基于MapReduce的Hive SQL查询速度提升3~90倍 (详情可以参考 此文 中的“ How much faster are Impala queries than Hive ones, really? ”部分) ,而且更加灵活易用。Impala是高角羚的意思,这种羚羊主要分布在东非。 同时,这个项目也将以 Cloudera Enterprise RTQ (Real-Time Query)为名进入CDH发行版。可以部署到生产环境的版本将到2013年一季度就绪。不过,据 ComputerWorld 和 MarketWatch 的报道, Capgemini金融服务、Karmasphere、MicroStrategy、Pentaho、Qlikview和Tableau等已经在Impala上做了几个月的实际产品测试。 众所周知,Hadoop及HBase、HDFS其实是在Google的MapReduce、BigTable和GFS三篇论文的启发下开发出来的。而近年来Google的基础架构又有了一波新的革新,有媒体称之为 后Hadoop时代的三驾马车 Caffeine、Pregel和Dremel

YARN Application exited with exitCode: -1000 Not able to initialize user directories

萝らか妹 提交于 2019-12-06 02:05:06
问题 I am getting: Application application_1427711869990_0001 failed 2 times due to AM Container for appattempt_1427711869990_0001_000002 exited with exitCode: -1000 due to: Not able to initialize user directories in any of the configured local directories for user kailash .Failing this attempt.. Failing the application. I couldn`t find anything related to this exit code and the associated reason. I am using Hadoop 2.5.0 (Cloudera 5.3.2). 回答1: Actually this is due to the permission issues on some

Possible memory issue crashing Hbase Thrift Server

房东的猫 提交于 2019-12-06 01:26:04
I'm running Cloudera CDH4 with Hbase and Hbase Thrift Server. Several times a day, the Thrift Server crashes. In /var/log/hbase/hbase-hbase-thrift-myserver.out, there is this: # # java.lang.OutOfMemoryError: Java heap space # -XX:OnOutOfMemoryError="kill -9 %p" # Executing /bin/sh -c "kill -9 8151"... In /var/log/hbase/hbase-hbase-thrift-myserver.log, there are no error messages at the end of the file. There are only a lot of DEBUG messages stating that one of the nodes is caching a particular file. I can't figure out any configuration options for the Hbase Thrift Server. There are no obvious

Multi-node Hadoop cluster with Docker

♀尐吖头ヾ 提交于 2019-12-05 20:50:07
问题 I am in planning phase of a multi-node Hadoop cluster in a Docker based environment. So it should be based on a lightweight easy to use virtualized system. Current architecture (regarding to documentation) contains 1 master and 3 slave nodes. This host machine uses HDFS filesystem and KVM for virtualization. The whole cloud is managed by Cloudera Manager . There are several Hadoop modules installed on this cluster. There is also a NodeJS data upload service. This time I should make

CDH6.2安装

淺唱寂寞╮ 提交于 2019-12-05 19:35:58
一、准备工作: 1、普通用户root权限准备(sudo,此处用aiot用户,sudoer) groupadd -r test useradd -d /home/test/ -m -s /bin/bash -c "user" -g test -p 2019 -r test #另外特别注意:aiot及密码用户在cloudera manager server中会配置进去,作为集群管理【host添加、agent安装&卸载、parcels管理os user】, 所以不能轻易在os修改aiot用户密码,如果需要修改,运维同学需要通知cdh集群管理员同步修改cms保存的密码) passwd test root: vim /etc/sudoers test ALL=(ALL) NOPASSWD: ALL 2.必备工具 sudo yum install yum-utils createrepo -y sudo yum install ansible -y sudo yum install httpd -y sudo yum install systemctl -y sudo yum install ntp -y sudo yum install hostnamectl -y ~~~~~~~~~~~~~~~~~~~ 3.集群 /etc/hosts设置 sudo vim /etc/hosts cat

03 - CDH 6.3.x 安装

拥有回忆 提交于 2019-12-05 19:15:32
CDH 6.3.x 离线安装 环境 CDH 6.3.1 CentOS 7 官方文档 修改主机名,配置host文件 # 根据个人需要修改主机名称 hostnamectl set-hostname node1 hostnamectl set-hostname node2 hostnamectl set-hostname node3 # 修改每个主机的 host 文件 vi /etc/hosts 172.16.1.181 node1 172.16.1.182 node2 172.16.1.183 node3 关闭系统默认安全防护 关闭防火墙 systemctl stop firewalld && systemctl disable firewalld && systemctl status firewalld 关闭SELinux.md ssh 使用密钥登陆 node1 到其他节点即可,配置参考文档: Linux SSH 使用密钥登陆.md 所有节点使用相同的密码,在安装的时候使用密码操作。 机器间时钟同步 所有节点时间必须一致,配置参考文档: Linux 配置时钟同步.md PG数据库 可以选择的数据有多种,这里使用PG,配置参考文档: PostgreSQL 安装之 CentOS 7 x64 RPM 安装.md 注意开启远程访问,使每个节点都能访问到数据库。 注意安装驱动程序。

HDFS as volume in cloudera quickstart docker

一曲冷凌霜 提交于 2019-12-05 16:54:06
I am fairly new to both hadoop and docker. I haven been working on extending the cloudera/quickstart docker image docker file and wanted to mount a directory form host and map it to hdfs location, so that performance is increased and data are persist localy. When i mount volume anywhere with -v /localdir:/someDir everything works fine, but that's not my goal. But when i do -v /localdir:/var/lib/hadoop-hdfs both datanode and namenode fails to start and I get : "cd /var/lib/hadoop-hdfs: Permission denied". And when i do -v /localdir:/var/lib/hadoop-hdfs/cache no permission denied but datanode

CDH5.16.1新增节点

余生颓废 提交于 2019-12-05 14:54:36
1.环境 CentOS 7.6 CDH 5.16.1 2.服务器环境准备 1.设置 hostname 和 hosts vim /etc/hostname vim /etc/hosts 2.关闭SELINUX,将SELINUX=enforcing 改为SELINUX=disabled vim /etc/selinux/config 3.配置ssh免密 #1.在新增节点生成公钥 ssh-keygen -t rsa #2.将本机的公钥复制到集群中其他服务器上 cd ~/.ssh ssh-copy-id cdh01 4.设置vm.swappiness=10 cat /proc/sys/vm/swappiness #查看当前值,设置后需要重启才生效 vi /etc/sysctl.conf vm.swappiness=10 #在文件最后增加一行 5.安装Java,Scala 6.配置Java的快捷方式 mkdir /usr/java ln -s /opt/module/jdk1.8.0_144/ /usr/java/default 7.拷贝mysql jar文件到目录 /usr/share/java/ mkdir /usr/share/java/ cp /opt/software/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27