Cloudera

离线CDH5搭建

微笑、不失礼 提交于 2020-01-12 15:20:25
为什么会出现CDH? 1:Apache Hadoop 版本管理换乱 2:部署过程繁琐,升级过程复杂 3:兼容性查 安全性低 Hadoop发行版: Apache Hadoop Cloudera’s Distribution Including Apache Hadoop(CDH) Hortonworks Data Platform (HDP) MapR EMR 什么是CDH? Cloudera’s Distribution Including Apache Hadoop(CDH) 是Hadoop众多分支的一种,由Cloudera维护,是基于稳定的Apache Hadoop去构建的 CDH提供了Hadoop的核心能力 分布式计算与可扩展存储以及基于Web的用户界面 CDH的优点: 1: 版本划分清晰 2:版本更新速度快 3:支持Kerberos安全认证 4:文档清晰 5;支持多种安装方式(Clouder Manager , Yum , Rmp ,Tarball) 本次介绍安装CDH的方式为Clouder Manager方式 ClouderaManager Clouder Manager 是一个管理CDH端到端的应用: 简单来说,Cloudera Manager是一个拥有集群自动化安装、中心化管理、集群监控、报警功能的一个工具(软件),使得安装集群从几天的时间缩短在几个小时内

杂七杂八日常错误记录

和自甴很熟 提交于 2020-01-11 05:30:21
日常错误 1、./cloudera-scm-agent start启动失败 在此目录下创建文件夹: cd /opt/cloudera-manager/cm-5.7.0/run mkdir cloudera-scm-agent 赋权:chown cloudera-scm:cloudera-scm cloudera-scm-agent 2、./scm_prepare_database.sh mysql -h myhost1.sf.cloudera.com -utemp -ptemp --scm-host myhost2.sf.cloudera.com scm scm scm 失败 提示classnotexception异常,原因缺少mysql-connection.jar包 将此jar包放置到/opt/cloudera-manager/cm-5.7.0/share/cmf/lib 3、CDH登录后管理主机只有一台 没有将/opt/cloudera-manager/cm-5.7.0/etc/cloudera-scm-agent/ 目录下的config.ini中的server改为主机地址。 4、修改linux下的mysql编码格式 编辑、/etc目录下的my.cnf文件 在[mysqld]下加:default_character_set=utf8 如果没有[client]新建[client

HBase Thrift: how to connect to remote HBase master/cluster?

ⅰ亾dé卋堺 提交于 2020-01-11 03:19:12
问题 Thanks to the Cloudera distribution, I have a HBase master/datanode + Thrift server running on a local machine, and can code and test HBase client programs and use it, no problem. However, I now need to use Thrift in production, and I'm not able to find documentation on how to get Thrift running with a production HBase cluster. From what I understand, I will need to run the hbase-thrift program on the client node since the Thrift program is just another intermediate client to HBase. So I'm

Find the location of hive-site.xml in java code

左心房为你撑大大i 提交于 2020-01-07 03:05:39
问题 I want to pass the location of hive-site.xml file in my java program. What is the best way to find out the location of this file automatically in java code? I do not want to hard code the path to /etc/hive/conf/hive-site.xml for cloudera distibution 回答1: By default the Hadoop Configuration constructors search for "blahblah-site.xml" config files in directories present in CLASSPATH . If it doesn't find them, then it reverts to hard-coded "default" values, without any warning (!). So make sure

service specific users not created in cloudera

守給你的承諾、 提交于 2020-01-07 02:02:12
问题 I did not face any porblems while installing cloudera but I just realized that I should have had users like oozie and hdfs created on my centos machine, I guess under /home directory? But I do not have any such users under home directory and I am not able to login as oozie user through su oozie command. Is it an installation problem or is there some other way to do it? Now, that I am trying to copy a jar in oozie sharelib folder, it does not allow so through root user and I do not see any

CDH安装

心已入冬 提交于 2020-01-07 00:40:34
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 配置环境 联网 配置好JDK mysql安装完成 时间同步 免密登录 关闭 SELINUX(重启生效) vi /etc/selinux/config SELINUX=disabled 下载第三方依赖(所有节点) yum -y install chkconfig python bind-utils psmisc libxslt zlib sqlite cyrus-sasl-plain cyrus-sasl-gssapi fuse fuse-libs redhat-lsb httpd mod_ssl 在mysql中创建 CM 用的数据库: //集群监控数据库 mysql> create database amon DEFAULT CHARSET utf8 COLLATE utf8_general_ci; //hive 数据库 mysql> create database hive DEFAULT CHARSET utf8 COLLATE utf8_general_ci; //oozie数据库 mysql> create database oozie DEFAULT CHARSET utf8 COLLATE utf8_general_ci; //hue数据库 create database hue DEFAULT

Should Cloudera Manager 5 be installed on a compute node, or on a standalone server?

给你一囗甜甜゛ 提交于 2020-01-05 14:03:40
问题 I'm installing a small cloud (10 nodes) using the free Cloudera Manager. Should I dedicate a server to Cloudera Manager, or can it be installed on one of the compute nodes? What's the best practice? I have an extra server to install the manager on, if that is a better idea. 回答1: Yes, you should include the Cloudera Manager host itself in your cloud. In the Cloudera Manager installer, it says: Cloudera recommends including Cloudera Manager server's host because it is often used for the

Cloudera Manager and hdfs-site.xml

谁说胖子不能爱 提交于 2020-01-05 05:58:30
问题 When using Cloudera Manager I can access to the hdfs-site.xml file via : Cloudera Manager > Cluster > HDFS > Instances > (NameNode, for example)> Processes COnfiguration Files > hdfs-site.xml Then the URL points to : http://quickstart.cloudera:7180/cmf/process/8/config?filename=hdfs-site.xml Is this file accessible directly via the file system and if yes, where is it located 回答1: The configurations set in the Cloudera Manager are stored in the Cloudera Manager Database. They are not persisted

Multiple query execution in cloudera impala

拟墨画扇 提交于 2020-01-04 14:11:31
问题 Is it possible to execute multiple queries at the same time in impala ? If yes, how does impala handle it? 回答1: I would certainly do some tests on your own, but I was not able to get multiple queries to execute: I was using Impala connection, and reading query from a .sql file. This works for single commands. from impala.dbapi import connect # actual server and port changed for this post for security conn=connect(host='impala server', port=11111,auth_mechanism="GSSAPI") cursor = conn.cursor()

Pig casting / datatypes

ぃ、小莉子 提交于 2020-01-04 08:15:22
问题 I'm trying to dump relation into AVRO file but I'm getting a strange error: org.apache.pig.data.DataByteArray cannot be cast to java.lang.CharSequence I don't use DataByteArray (bytearray), see description of the relation below. sensitiveSet: {rank_ID: long,name: chararray,customerId: long,VIN: chararray,birth_date: chararray,fuel_mileage: chararray,fuel_consumption: chararray} Even when I do explicit casting I get the same error: sensitiveSet = foreach sensitiveSet generate (long) $0,