Cloudera

Oozie - Hadoop commands are not executing (Shell)

巧了我就是萌 提交于 2020-01-04 05:30:55
问题 I am running a shell script that has hadoop commands. Getting the following error when executing the same Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1] I am running a simple shell script with Cloudera Hue - Oozie However when the script has no hadoop commands, it gets executed sucessfully. I have set oozie.use.system.libpath=true and could see my libs are in user/oozie/share/lib/<lib_timestmap> Below is the shell script I am trying to run #! /bin/bash $(hadoop fs -mkdir

How to get consumers to work in Kafka 0.8 API

北城以北 提交于 2020-01-03 05:16:09
问题 I am about to write a prototype for publishing and consuming kafka messages. We do have a Cloudera infrastructure set up already (zookeepers, brokers, etc.), and I have played with the Kafka command-line tools successfully already, to produce and consume messages. I am using [org.apache.kafka/kafka_2.10 "0.8.2.1"] as dependency, and have already been able to use the client API to set up a KafkaProducer which publishes messages with plain String content, and can be successfully read by the

Connection refused to quickstart.cloudera:8020

旧城冷巷雨未停 提交于 2020-01-03 02:41:08
问题 I'm using Cloudera-quickstart 5.5.0 virtualbox Trying to run this on terminal. As you can below, there is an exception. I've searched for solution to solve this on internet and found something. 1-) configuring core-site.xml file. https://datashine.wordpress.com/2014/09/06/java-net-connectexception-connection-refused-for-more-details-see-httpwiki-apache-orghadoopconnectionrefused/ But I can only open this file readable and haven't been able to change it. It seems I need to be root or hdfs user

Initialize Cloudera Hive Docker Container With Data

痴心易碎 提交于 2020-01-02 20:18:58
问题 I am running the Cloudera suite in a Docker Container using the image described here: https://hub.docker.com/r/cloudera/quickstart/ I have the following configuration: Dockerfile FROM cloudera/quickstart:latest Docker Compose file version: '3.1' services: db-hive: container_name: mobydq-test-db-hive image: mobydq-test-db-hive restart: always build: context: . dockerfile: ./db-hive/Dockerfile expose: - 10000 networks: - default hostname: quickstart.cloudera privileged: true tty: true command:

Get Line number in map method using FileInputFormat

生来就可爱ヽ(ⅴ<●) 提交于 2020-01-02 10:19:37
问题 I was wondering whether it is possible to get the line number in my map method? My input file is just a single column of values like, Apple Orange Banana Is it possible to get key: 1, Value: Apple , Key: 2, Value: Orange ... in my map method? Using CDH3/CDH4. Changing the input data so as to use KeyValueInputFormat is not an option. Thanks ahead. 回答1: The default behaviour of InputFormats such as TextInputFormat is to give the byte offset of the record rather than the actual line number -

hdfs - ls: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException:

…衆ロ難τιáo~ 提交于 2020-01-02 01:14:05
问题 I am trying to use the below to list my dirs in hdfs: ubuntu@ubuntu:~$ hadoop fs -ls hdfs://127.0.0.1:50075/ ls: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.; Host Details : local host is: "ubuntu/127.0.0.1"; destination host is: "ubuntu":50075; Here is my /etc/hosts file 127.0.0.1 ubuntu localhost #127.0.1.1 ubuntu # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6

Client cannot authenticate via:[TOKEN, KERBEROS]

痞子三分冷 提交于 2020-01-01 06:13:06
问题 I'm using YarnClient to programmatically start a job. The cluster i'm running on has been kerberos-ized. Normal map reduce jobs submitted via "yarn jar examples.jar wordcount..." work. The job i'm trying to submit programmatically, does not. I get this error: 14/09/04 21:14:29 ERROR client.ClientService: Error happened during application submit: Application application_1409863263326_0002 failed 2 times due to AM Container for appattempt_1409863263326_0002_000002 exited with exitCode: -1000

error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol

谁都会走 提交于 2020-01-01 03:25:12
问题 I'm currently trying to test the implemented changes for achieving security with Encrypted Shuffle in Cloudera Hadoop Environment. I've created the certificates and keystores and kept them in appropriate locations. I'm testing TaskTracker's HTTPS port of 50060. When I do a curl on that port, I get below error response. ubuntu@node2:~$ curl -v -k "https://10.0.10.90:50060" * About to connect() to 10.0.10.90 port 50060 (#0) * Trying 10.0.10.90... connected * successfully set certificate verify

Unable to configure ORC properties in Spark

江枫思渺然 提交于 2019-12-30 03:34:06
问题 I am using Spark 1.6 (Cloudera 5.8.2) and tried below methods to configure ORC properties. But it does not effect output. Below is the code snippet i tried. DataFrame dataframe = hiveContext.createDataFrame(rowData, schema); dataframe.write().format("orc").options(new HashMap(){ { put("orc.compress","SNAPPY"); put("hive.exec.orc.default.compress","SNAPPY"); put("orc.compress.size","524288"); put("hive.exec.orc.default.buffer.size","524288"); put("hive.exec.orc.compression.strategy",

Hadoop : Provide directory as input to MapReduce job

扶醉桌前 提交于 2019-12-29 05:27:07
问题 I'm using Cloudera Hadoop. I'm able to run simple mapreduce program where I provide a file as input to MapReduce program. This file contains all the other files to be processed by mapper function. But, I'm stuck at one point. /folder1 - file1.txt - file2.txt - file3.txt How can I specify the input path to MapReduce program as "/folder1" , so that it can start processing each file inside that directory ? Any ideas ? EDIT : 1) Intiailly, I provided the inputFile.txt as input to mapreduce