Cloudera

Oozie job submission fails

非 Y 不嫁゛ 提交于 2019-12-13 20:31:31
问题 I am trying to submit an example map reduce oozie job and all the properties are configured properly with regards to the path and name node and job-tracker port etc. I validated the workflow.xml too . when I deploy the job I get a job id and when I check the status I see a status KILLED and the details basically say that /var/tmp/oozie/oozie-oozi7188507762062318929.dir/map-reduce-launcher.jar does not exist. 回答1: In order to resolve this error, just crate hdfs folders and give appropriate

Configure edge node to launch Hadoop jobs on cluster running on a private network

南楼画角 提交于 2019-12-13 17:04:46
问题 I am trying to setup an edge node to a cluster in my work place. The cluster is CDH 5.* Hadoop Yarn. It has it's own internal private high speed network. The edge node is outside the private network. I ran the steps for hadoop client setup and configured the core-site.xml sudo apt-get install hadoop-client Since the cluster is hosted on it's own private network the IP addresses in the internal network are different. 10.100.100.1 - Namemode 10.100.100.2 - Data Node 1 10.100.100.4 - Data Node 2

Hive tables not found in Spark SQL - spark.sql.AnalysisException in Cloudera VM

旧时模样 提交于 2019-12-13 12:22:27
问题 I am trying to access Hive tables through a java program, but looks like my program is not seeing any table in the default database. I however can see the same tables and query them through spark-shell. I have copied hive-site.xml in spark conf directory. Only difference - the spark-shell is running spark version 1.6.0 where my java program is running Spark 2.1.0 package spark_210_test; import java.util.List; import org.apache.spark.SparkConf; import org.apache.spark.sql.Dataset; import org

How to resolve 'file could only be replicated to 0 nodes, instead of 1' in hadoop?

不想你离开。 提交于 2019-12-13 11:42:02
问题 I have a simple hadoop job that crawls websites and caches them to the HDFS. The mapper checks if a URL already exists in the HDFS and if so, uses it otherwise downloads the page and saves it to the HDFS. If an network error (404, etc) is encountered while downloading the page, then the URL is skipped entirely - not written to the HDFS. Whenever I run a small list ~1000 websites, I always seem to encounter this error which crashes the job repeatedly in my pseudo distributed installation. What

Issue with WITH clause with Cloudera JDBC Driver for Impala - Returning column name instead of actual Data

时间秒杀一切 提交于 2019-12-13 07:38:21
问题 I am using Cloudera JDBC Driver for Impala v 2.5.38 with Spark 1.6.0 to create DataFrame. It is working fine for all queries except WITH clause, but WITH is extensively used in my organization. Below is my code snippet. def jdbcHDFS(url:String,sql: String):DataFrame = { var rddDF: DataFrame = null val jdbcURL = s"jdbc:impala://$url" val connectionProperties = new java.util.Properties connectionProperties.setProperty("driver","com.cloudera.impala.jdbc41.Driver") rddDF = sqlContext.read.jdbc

My MapReduce Program produces a zero output

不问归期 提交于 2019-12-13 06:29:09
问题 The output folder has part-00000 file with no content! Here is the command trace where I see no exception, [cloudera@localhost ~]$ hadoop jar testmr.jar TestMR /tmp/example.csv /user/cloudera/output 14/02/06 11:45:24 WARN conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id 14/02/06 11:45:24 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 14/02/06 11:45:24 WARN mapred.JobClient: Use GenericOptionsParser for parsing the

Hive not detecting timestamp format

送分小仙女□ 提交于 2019-12-13 04:58:25
问题 I have a PIG script that Loads and transforms the data from a csv Replaces some characters Calls a java program (JAR) to convert the date-time in csv from 06/02/2015 18:52 to 2015-6-2 18:52 (mm/DD/yyyy to yyyy-MM-dd) REGISTER /home/cloudera/DateTime.jar; A = Load '/user/cloudera/Data.csv' using PigStorage(',') as (ac,datetime,amt,trace); B = FOREACH A GENERATE ac, REPLACE(datetime, '\\/','-') as newdate,REPLACE(amt,'-','') as newamt,trace; C = FOREACH B GENERATE ac,Converter.DateTime(newdate)

Flume: Data transferring to Server

别来无恙 提交于 2019-12-13 04:52:40
问题 I am new to Flume-ng. I have to write a program, which can transfer a text file to other program (agent). I know we must know about agent i.e. host-ip, port number etc. Then a source, sink and a channel should be defined. I just want to transfer a log file to server. My client code is as follows. public class MyRpcClientFacade { public class MyClient{ private RpcClient client; private String hostname; private int port; public void init(String hostname, int port) { this.hostname = hostname;

Cloudera Manager Express Wizard Does Not Detect EC2

99封情书 提交于 2019-12-13 04:50:33
问题 I am attempting to install Cloudera Manager on an EC2 instance following these directions. They indicate that once I have installed Manager and navigated to the EC2 host page at port 7180, it will automatically detect that it is running on an EC2 instance and allow me to have it deploy and configure my hadoop cluster. The docs mention an EC2 related warning message on the welcome screen. Instead I get a different welcome screen that doesn't mention EC2: When I click continue I get a second

How to create External Table on Hive from data on local disk instead of HDFS?

倾然丶 夕夏残阳落幕 提交于 2019-12-12 17:25:40
问题 For data on HDFS, we can do CREATE EXTERNAL TABLE <table> { id INT, name STRING, age INT } LOCATION 'hdfs_path'; But how to specify a local path for the LOCATION above? Thanks. 回答1: You can upload the file to HDFS first using "hdfs dfs -put " and then create Hive external table on top of that. The reason that Hive cannot create external table on local file is because when Hive processes data, the actual processing happens on the Hadoop cluster where your local file may not be accessible at