cloudera-cdh

java.lang.AbstractMethodError, org.apache.spark.internal.Logging$class.initializeLogIfNecessary

微笑、不失礼 提交于 2019-12-02 04:36:47
问题 I m running the kafka producer and consumer code for testing purpose in cdh 5.12. While I m trying to do so I m facing below error while running the consumer code. dataSet: org.apache.spark.sql.Dataset[(String, String)] = [key: string, value: string] query: org.apache.spark.sql.streaming.StreamingQuery = org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@109a5573 2018-10-25 10:08:37 ERROR MicroBatchExecution:91 - Query [id = 70bc4f7a-cc41-470d-afd0-d46e5aebf3db, runId = 4d974468

Is it possible to load parquet table directly from file?

牧云@^-^@ 提交于 2019-12-02 01:28:35
If I have a binary data file(it can be converted to csv format), Is there any way to load parquet table directly from it? Many tutorials show loading csv file to text table, and then from text table to parquet table. From efficiency point of view, is it possible to load parquet table directly from either a binary file like what I already have? Ideally using create external table command. Or I need to convert it to csv file first? Is there any file format restriction? Unfortunately it is not possible to read from a custom binary format in Impala. You should convert your files to csv, then

java.lang.AbstractMethodError, org.apache.spark.internal.Logging$class.initializeLogIfNecessary

廉价感情. 提交于 2019-12-02 00:31:02
I m running the kafka producer and consumer code for testing purpose in cdh 5.12. While I m trying to do so I m facing below error while running the consumer code. dataSet: org.apache.spark.sql.Dataset[(String, String)] = [key: string, value: string] query: org.apache.spark.sql.streaming.StreamingQuery = org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@109a5573 2018-10-25 10:08:37 ERROR MicroBatchExecution:91 - Query [id = 70bc4f7a-cc41-470d-afd0-d46e5aebf3db, runId = 4d974468-6c6b-47e5-976b-8b9aa98114e2] terminated with error java.lang.AbstractMethodError at org.apache.spark

How to check whether the file exist in HDFS location, using oozie?

醉酒当歌 提交于 2019-12-01 22:24:36
问题 How to check whether a file in HDFS location is exist or not, using Oozie? In my HDFS location I will get a file like this test_08_01_2016.csv at 11PM , on a daily basis. I want check whether this file exist after 11.15 PM. I can schedule the batch using a Oozie coordinator job. But how can I validate if the file exists in HDFS? 回答1: you can use EL expression in oozie like: <decision name="CheckFile"> <switch> <case to="nextOozieTask"> ${fs:exists('/path/test_08_01_2016.csv')} <!--do note the

How do the hive sql queries are submitted as mr job from hive cli

余生颓废 提交于 2019-12-01 21:12:36
I have deployed a CDH-5.9 cluster with MR as hive execution engine. I have a hive table named "users" with 50 rows. Whenever I execute the query select * from users works fine as follows : hive> select * from users; OK Adam 1 38 ATK093 CHEF Benjamin 2 24 ATK032 SERVANT Charles 3 45 ATK107 CASHIER Ivy 4 30 ATK384 SERVANT Linda 5 23 ATK132 ASSISTANT . . . Time taken: 0.059 seconds, Fetched: 50 row(s) But issuing select max(age) from users failed after submitting as mr job. The container log also doesn't have any information to figure it out why its getting failed. hive> select max(age) from

Can I have multiple spark versions installed in CDH?

一曲冷凌霜 提交于 2019-12-01 08:48:26
I'm using cdh5.1.0, which already has default spark installed. However, I want to use Spark 1.3. Can I also install this version to cdh5.1.0? How is it possible to set these up? Will the new version of spark also be monitored via Cloudera manager? Yes, you can run any Apache Spark version you like. Just make sure it's built for the version of YARN you have (2.3 for CDH 5.1.0). You can then run your application as a YARN application with spark-submit . (See http://spark.apache.org/docs/latest/running-on-yarn.html .) It will be monitored like any other YARN application. Spark doesn't need to be

Can I have multiple spark versions installed in CDH?

廉价感情. 提交于 2019-12-01 07:14:01
问题 I'm using cdh5.1.0, which already has default spark installed. However, I want to use Spark 1.3. Can I also install this version to cdh5.1.0? How is it possible to set these up? Will the new version of spark also be monitored via Cloudera manager? 回答1: Yes, you can run any Apache Spark version you like. Just make sure it's built for the version of YARN you have (2.3 for CDH 5.1.0). You can then run your application as a YARN application with spark-submit . (See http://spark.apache.org/docs

Namenode HA (UnknownHostException: nameservice1)

人走茶凉 提交于 2019-12-01 04:15:56
We enable Namenode High Availability through Cloudera Manager, using Cloudera Manager >> HDFS >> Action > Enable High Availability >> Selected Stand By Namenode & Journal Nodes Then nameservice1 Once the whole process completed then Deployed Client Configuration. Tested from Client Machine by listing HDFS directories (hadoop fs -ls /) then manually failover to standby namenode & again listing HDFS directories (hadoop fs -ls /). This test worked perfectly. But When I ran hadoop sleep job using following command it failed $ hadoop jar /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop-0

Datastax Cassandra Driver throwing CodecNotFoundException

ぐ巨炮叔叔 提交于 2019-12-01 03:49:35
The exact Exception is as follows com.datastax.driver.core.exceptions.CodecNotFoundException: Codec not found for requested operation: [varchar <-> java.math.BigDecimal] These are the versions of Software I am using Spark 1.5 Datastax-cassandra 3.2.1 CDH 5.5.1 The code I am trying to execute is a Spark program using the java api and it basically reads data (csv's) from hdfs and loads it into cassandra tables . I am using the spark-cassandra-connector. I had a lot of issues regarding the google s guava library conflict initially which I was able to resolve by shading the guava library and

Namenode HA (UnknownHostException: nameservice1)

纵然是瞬间 提交于 2019-12-01 02:30:05
问题 We enable Namenode High Availability through Cloudera Manager, using Cloudera Manager >> HDFS >> Action > Enable High Availability >> Selected Stand By Namenode & Journal Nodes Then nameservice1 Once the whole process completed then Deployed Client Configuration. Tested from Client Machine by listing HDFS directories (hadoop fs -ls /) then manually failover to standby namenode & again listing HDFS directories (hadoop fs -ls /). This test worked perfectly. But When I ran hadoop sleep job using