cloudera-cdh

Unable to run example spark job with oozie

99封情书 提交于 2019-12-11 12:23:32
问题 I'm trying to setup oozie on a CDH 5.7 cluster. I've installed and configured everything by following steps from cloudera documentation. Finally I extracted oozie-examples.tar.gz, -put it to hdfs and tried to run some examples. MR example runs fine, but the spark one fails with the following error: Resource hdfs://cluster/user/hdfs/.sparkStaging/application_1462195303197_0009/oozie-examples.jar changed on src filesystem (expected 1462196523983, was 1462196524951 The command I used to run the

Hadoop MapReduce (Yarn) using hosts with different power/specifications

我是研究僧i 提交于 2019-12-11 12:09:03
问题 I currently have high power (cpu/ram) hosts in the cluster and we are considering to add some good storage but low power hosts. My concern is that it will reduce the jobs performance. Map/Reducers from the new (less powerful) hosts will run slower and the more powerful ones will just have to wait for the result. Is there a way to configure this in Yarn ? Maybe to set a priority for the hosts or to assign mapper/reducers according to the number of cores on each machines. Thanks, Horatiu 回答1:

Cloudera CDH 5.7.2 / HBase: How to Set hfile.format.version?

假如想象 提交于 2019-12-11 06:21:09
问题 With CDH 5.7.2-1.cdh5.7.2.po.18, I am trying to use Cloudera Manager to configure HBase to use visibility labels and authorizations, as described in the Cloudera Community post below: Cloudera Manager Hbase Visibility Labels Using Cloudera Manager, I have successfully updated the values of the following properties: hbase.coprocessor.region.classes: Set to org.apache.hadoop.hbase.security.visibility.VisibilityController hbase.coprocessor.master.classes: Set to org.apache.hadoop.hbase.security

Why does importing SparkSession in spark-shell fail with “object SparkSession is not a member of package org.apache.spark.sql”?

送分小仙女□ 提交于 2019-12-10 23:49:43
问题 I use Spark 1.6.0 on my VM, Cloudera machine. I'm trying to enter some data into Hive table from Spark shell. To do that, I am trying to use SparkSession. But the below import is not working. scala> import org.apache.spark.sql.SparkSession <console>:33: error: object SparkSession is not a member of package org.apache.spark.sql import org.apache.spark.sql.SparkSession And without that, I cannot execute this statement: val spark = SparkSession.builder.master("local[2]").enableHiveSupport()

Spark Streaming application fails with KafkaException: String exceeds the maximum size or with IllegalArgumentException

一世执手 提交于 2019-12-10 19:46:15
问题 TL;DR: My very simple Spark Streaming application fails in the driver with the "KafkaException: String exceeds the maximum size". I see the same exception in the executor but I also found somewhere down the executor's logs an IllegalArgumentException with no other information in it Full problem: I'm using Spark Streaming to read some messages from a Kafka topic. This is what I'm doing: val conf = new SparkConf().setAppName("testName") val streamingContext = new StreamingContext(new

HDFS Capacity: how to read “dfsadmin report”

空扰寡人 提交于 2019-12-10 17:19:36
问题 I am using Hadoop 2.6.0. When I run "hdfs dfsadmin -report" I got something like this (simplified): Configured Capacity: 3 TB Present Capacity: 400GB DFS Remaining: 300 GB DFS Used: 100 GB I am wondering what "configured capacity" is and what "present capacity" is. It looks like "Present Capacity" is the one in effect. How can I increase this? 回答1: Configured Capacity is the total available capacity of the disks/voulmes used for data directory. Eg: I've three 1TB disks mounted on /Hadoop/sdb1

hadoop namenode port in use

早过忘川 提交于 2019-12-10 12:57:25
问题 This is actually a standby HA namenode. It was configured with the same settings as the primary and hdfs namenode -bootstrapStandby was successfully run. It begins coming up on the standard HTTP port 50070 as defined in the config file: <property> <name>dfs.namenode.http-address.ha-hadoop.namenode2</name> <value>namenode2:50070</value> </property> The start up begins OK then hits: 15/02/02 08:06:17 INFO hdfs.DFSUtil: Starting Web-server for hdfs at: http://hadoop1:50070 15/02/02 08:06:17 INFO

YARN UNHEALTHY nodes

牧云@^-^@ 提交于 2019-12-09 17:16:31
问题 In our YARN cluster which is 80% full, we are seeing some of the yarn nodemanager's are marked as UNHEALTHY. after digging into logs I found its because disk space is 90% full for data dir. With following error 2015-02-21 08:33:51,590 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Node hdp009.abc.com:8041 reported UNHEALTHY with details: 4/4 local-dirs are bad: /data3/yarn/nm,/data2/yarn/nm,/data4/yarn/nm,/data1/yarn/nm; 2015-02-21 08:33:51,590 INFO org.apache.hadoop

Can ETL informatica Big Data edition (not the cloud version) connect to Cloudera Impala?

冷暖自知 提交于 2019-12-08 15:51:51
问题 We are trying do a proof of concept on Informatica Big Data edition (not the cloud version) and I have seen that we might be able to use HDFS, Hive as source and target. But my question is does Informatica connect to Cloudera Impala? If so, do we need to have any additional connector for that? I have done comprehensive research to check if this is supported but could not find anything. Did anyone already try this? If so, can you specify the steps and link to any documentation? Informatica

Error: java.lang.IllegalArgumentException: Comparison method violates its general contract even using workaround

梦想的初衷 提交于 2019-12-08 10:02:48
问题 I have already spent two days to short out this error, even I tried workaround which are suggested in several stackoverflow posts "-Djava.util.Arrays.useLegacyMergeSort=true" but it also doesnt work. this is the details of my command and its returning error: Command: hadoop jar CloudBrush.jar -Djava.awt.headless=true -Djava.util.Arrays.useLegacyMergeSort=true -reads /Ec10k -asm Ec10k_Brush -k 21 -readlen 36 Error: Error: java.lang.IllegalArgumentException: Comparison method violates its