Cloudera | 易学教程

CDH升级

阅读更多关于 CDH升级

升级主要分为两部分1.CM的升级、2.CDH的升级 CM的升级两种升级方法 1.使用 package 2.使用Tarballs升级方法，参照官方升级指引，Tarball中包含了 Cloudera Manager Server 和 Cloudera Manager Agent 通常情况下升级CM和升级CDH是两个独立的过程，可以在不关闭CDH服务的情况下升级CM，然后再升级CDH，升级CM主要分为以下几个步骤 1.收集升级信息在升级之前收集与CM相关的信息，包括账户、密码、数据库URLs等。 1.有root用户权限或者起码有sudo权限 2.查看CM、JDK的版本 3.CDH的版本进入CM主页面 4.所安装的服务 5.查看系统版本 hosts->All Hosts 随便点击一个主机 2.完成升级前的准备 1.本次要升级到的版本为5.13.X，支持的OS版本 2.查看用户自定义的服务 Administration->setting->Custom Service Descriptors 3.升级JDK 升级JDK到1.8 . 在页面设置java的目录:主机(Hosts)->所有主机(All Hosts)->配置(Configuration)->类别(CATEGORY)->高级(Advanced)。这样做仅仅修改了CM和CDH依赖的JDK，并不会影响要其他的进程 4.升级CM

Trying out Cloudera Spark Tutorial won't work “classnotfoundexception”

阅读更多关于 Trying out Cloudera Spark Tutorial won't work “classnotfoundexception”

问题 I tried solutions suggested in similar existing post but none works for me :-( getting really hopeless, so I decided to post this as a new question. I tried a tutorial (link below) on building a first scala or java application with Spark in a Cloudera VM. this is my spark-submit command and its output [cloudera@quickstart sparkwordcount]$ spark-submit --class com.cloudera.sparkwordcount.SparkWordCount --master local /home/cloudera/src/main/scala/com/cloudera/sparkwordcount/target

Using whirr to setup ec2 cluster

阅读更多关于 Using whirr to setup ec2 cluster

问题 I managed to launch a cluster of 10 nodes on amazon ec2 using Whirr. Now I need to install R and Packages. This is the command: whirr run-script --script /home/cloudera/TutorialBreen/config/whirr-ec2/install-r+packages.sh --config /home/cloudera/TutorialBreen/config/whirr-ec2/hadoop-ec2.properties Unfortunately I get an error because the link to the rmr-package in the .sh-file isn't live anymore. This is the original install-r+packeges.sh file: sudo yum -y --enablerepo=epel install R R-devel

Spark job fails due to java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions

阅读更多关于 Spark job fails due to java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions

问题 I am having a problem to run an spark job via spark-submit due to the following error: 16/11/16 11:41:12 ERROR yarn.ApplicationMaster: User class threw exception: java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(org.apache.hadoop.fs.Path, java.lang.String, java.util.Map, boolean, int, boolean, boolean, boolean) java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(org.apache.hadoop.fs.Path, java.lang.String

Spark job fails due to java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions

阅读更多关于 Spark job fails due to java.lang.NoSuchMethodException: org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions

Create impala table and add data using java

阅读更多关于 Create impala table and add data using java

问题 I am trying to create an impala table and add data into it using java. 1.) How should I create connection for impala? 2.) Can I create an impala table directly or should I create a hive table and access it using impala? A java program or code snippets will help. 回答1: You could use the impala jdbc driver. Refer the following link , this has a sample code for the impala connection using the jdbc driver. For creating a table in impala you could pass it as a query, neednot do it in hive. In case

Creating a hive table with ~40K columns

阅读更多关于 Creating a hive table with ~40K columns

问题 I'm trying to create a fairly large table. ~3 millions rows and ~40K columns using hive. To begin, I'm creating an empty table and inserting the data into the table. However, I hit an error when trying this. Unable to acquire IMPLICIT, SHARED lock default after 100 attempts. FAILED: Error in acquiring locks: Locks on the underlying objects cannot be acquire. retry after some time The query is pretty straightforward: create external table database.dataset ( var1 decimal(10,2), var2 decimal(10

Unable to insert 5k/sec records into impala?

阅读更多关于 Unable to insert 5k/sec records into impala?

问题 I am exploring Impala for a POC, however I can't see any significant performance. I can't insert 5000 records/sec, at max I was able to insert mere 200/sec. This is really slow considering any database performance. I tried two different methods but both are slow: Using Cloudera First, I installed Cloudera on my system and added latest CDH 6.2 cluster. I created a java client to insert data using ImpalaJDBC41 driver. I am able to insert record but speed is terrible. I tried tuning impala by

Error importing dataset in Hive

阅读更多关于 Error importing dataset in Hive

问题 I have a dataset as follows, John Doe^A100000.0^AMary Smith^BTodd Jones^AFederal Taxes^C.2^BState Taxes^C.05^BInsurance^C.1^A1 Michigan Ave.^BChicago^BIL^B60600 Mary Smith^A80000.0^ABill King^AFederal Taxes^C.2^BState Taxes^C.05^BInsurance^C.1^A100 Ontario St.^BChicago^BIL^B60601 Todd Jones^A70000.0^AFederal Taxes^C.15^BState Taxes^C.03^BInsurance^C.1^A200 Chicago Ave.^BOak Park^BIL^B60700 Bill King^A60000.0^AFederal Taxes^C.15^BState Taxes^C.03^BInsurance^C.1^A300 Obscure Dr.^BObscuria^BIL

Kerberos | Cloudera | KrbException: Encryption type AES256 CTS mode with HMAC SHA1-96

阅读更多关于 Kerberos | Cloudera | KrbException: Encryption type AES256 CTS mode with HMAC SHA1-96

问题 I have been trying to setup Kerberos for CDH 4.5 which was setup using the Cloudera Manager Installer. The instructions are from the following link: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.5.2/Configuring-Hadoop-Security-with-Cloudera-Manager/cmeechs_topic_4.html After setting up and KDC I copied the JCE policy for Java 6 files to the following location: /usr/java/jdk1.6.0_31/lib/security/ Following is my "/var/kerberos/krb5kdc/kdc.conf" file: [kdcdefaults] kdc