Cloudera

historyserver not able to read log after enabling kerberos

旧城冷巷雨未停 提交于 2019-12-12 00:14:26
问题 I enable the Kerberos on the cluster and it is working fine. But due to some issue mapred user is not able to read and display log over JobHistory server. I check the logs of job history server and it giving access error as: org.apache.hadoop.security.AccessControlException: Permission denied:user=mapred, access=READ_EXECUTE, inode="/user/history/done_intermediate/prakul":prakul:hadoop:drwxrwx--- as we can see the directory have access to hadoop group and mapred is in hadoop group, even then

How to handle potential data loss when performing comparisons across data types in different groups

ぃ、小莉子 提交于 2019-12-11 19:35:53
问题 Background: Our group is going through a Cloudera upgrade to 6.1.1 and I have been tasked with determining how to handle the loss of the implicit data type conversion across data types. See link below for the relevant Release Note details. https://docs.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_611_incompatible_changes.html#hive_union_all_returns_incorrect_data Not only does this issue affect UNION ALL queries, but there is a function that performs comparisons on

CDH4 - Exception: java.lang.IncompatibleClassChangeError:

浪尽此生 提交于 2019-12-11 18:09:30
问题 I am getting a java issue when I launch a pig script, it appears to be some dependency or version conflict, Running Debian/Cloudera CDH4/ Apache Pig java.lang.Exception: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406) Caused by: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.Counter, but class was expected 回答1: The

Cloudera Manager isn't opening

你离开我真会死。 提交于 2019-12-11 17:08:04
问题 I have an VM: cloudera-quickstart-vm-5.13.0-0-virtualbox , run now. But the Cloudera Manager's Page isn't being shown. The message: 'Attempting to connect to Cloudera Manager...' is being shown the all day. How Can I solve this problem? 回答1: Cloudera manager has to be restarted separately in the quickstart VM. You can run the below command and see it works: /home/cloudera/cloudera-manager --force --express 来源: https://stackoverflow.com/questions/56474423/cloudera-manager-isnt-opening

TimeStamp issue in hive 1.1

三世轮回 提交于 2019-12-11 14:21:46
问题 I am facing a very weird issue in hive in production environment(cloudera 5.5) which is basically not reproducible in my local server(Don't know why) i.e. for some records I am having wrong timestamp value while inserting from temp table to main table as String "2017-10-21 23" is converted into timestamp "2017-10-21 23:00:00" datatype while insertion. example:: 2017-10-21 23 -> 2017-10-21 22:00:00 2017-10-22 15 -> 2017-10-22 14:00:00 It is happening very very infrequent. Means delta value is

HDFS using Cloudera Manager in private cloud

╄→гoц情女王★ 提交于 2019-12-11 13:21:08
问题 This is driving me crazy. I have been working on this for days and just can't seem to solve this issue. I have a private cloud running on eucalyptus for testing and 4 VMs running Ubuntu 12.04. I am trying to get cloudera to run HDFS and map-reduce however when I try to start it up, the data-nodes never seem to be able to communicate with the name-node. It installs fine and passes all the pre-launch checks. Host files are all set up with 127.0.0.1 localhost and the ip and hostnames of the

mapreduce, sort values

牧云@^-^@ 提交于 2019-12-11 13:09:52
问题 I have an ouput from my mapper: Mapper: KEY, VALUE(Timestamp, someOtherAttrbibutes) My Reducer does recieve: Reducer: KEY, Iterable<VALUE(Timestamp, someOtherAttrbibutes)> I want Iterable<VALUE(Timestamp, someOtherAttrbibutes)> to ordered by Timestamp attribute. Is there any possibility to implement it? I would like to avoid manual sorting inside Reducer code. http://cornercases.wordpress.com/2011/08/18/hadoop-object-reuse-pitfall-all-my-reducer-values-are-the-same/ I'll have to "deep-copy"

What is 'Active Jobs' in Spark History Server Spark UI Jobs section

て烟熏妆下的殇ゞ 提交于 2019-12-11 12:57:18
问题 I'm trying to understand Spark History server components. I know that, History server shows completed Spark applications. Nonetheless, I see 'Active Jobs' set to 1 for a completed Spark application. I'm trying to understand what is 'Active Jobs' mean in Jobs section. Also, Application completed within 30 minutes, but when I opened History Server after 8 hours, 'Duration' shows 8.0h. Please see the screenshot. Could you please help me understand 'Active Jobs', 'Duration' and 'Stages: Succeeded

Accessing HBase table data from Hive based on Time Stamp

主宰稳场 提交于 2019-12-11 12:38:31
问题 I have created a HBase by mentioning the default versions as 10 create 'tablename',{NAME => 'cf', VERSIONS => 10} and inserted two rows(row1 and row2) put 'tablename','row1','cf:id','row1id' put 'tablename','row1','cf:name','row1name' put 'tablename','row2','cf:id','row2id' put 'tablename','row2','cf:name','row2name' put 'tablename','row2','cf:name','row2nameupdate' put 'tablename','row2','cf:name','row2nameupdateagain' put 'tablename','row2','cf:name','row2nameupdateonemoretime' Tried to

CDH5.3.2中配置运行Spark SQL的Thrift Server

二次信任 提交于 2019-12-11 11:29:38
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 一,环境信息 CDH集群,Cloudera Manager5安装部署CDH5.X详细请见: http://blog.csdn.net/freedomboy319/article/details/44804721 二,在CDH5.3.2中配置运行 Spark SQL的Thrift Server 1,root用户登录CDH5.3.2集群中的某一个节点 2,cd /opt/cloudera/parcels/CDH/lib/spark/sbin 执行./start-thriftserver.sh –help 3,执行./start-thriftserver.sh 4,进入/opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p0.10/lib/spark/logs目 录,查看日志文件spark-root- org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-cdh- node3.grc.out,发现报如下错: Spark Command: /usr/java/jdk1.7.0_67-cloudera/bin/java -cp ::/opt/cloudera/parcels/CDH-5.3.2-1.cdh5.3.2.p0