cloudera-cdh | 易学教程

java.lang.AbstractMethodError, org.apache.spark.internal.Logging$class.initializeLogIfNecessary

阅读更多关于 java.lang.AbstractMethodError, org.apache.spark.internal.Logging$class.initializeLogIfNecessary

问题 I m running the kafka producer and consumer code for testing purpose in cdh 5.12. While I m trying to do so I m facing below error while running the consumer code. dataSet: org.apache.spark.sql.Dataset[(String, String)] = [key: string, value: string] query: org.apache.spark.sql.streaming.StreamingQuery = org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@109a5573 2018-10-25 10:08:37 ERROR MicroBatchExecution:91 - Query [id = 70bc4f7a-cc41-470d-afd0-d46e5aebf3db, runId = 4d974468

Is it possible to load parquet table directly from file?

阅读更多关于 Is it possible to load parquet table directly from file?

If I have a binary data file(it can be converted to csv format), Is there any way to load parquet table directly from it? Many tutorials show loading csv file to text table, and then from text table to parquet table. From efficiency point of view, is it possible to load parquet table directly from either a binary file like what I already have? Ideally using create external table command. Or I need to convert it to csv file first? Is there any file format restriction? Unfortunately it is not possible to read from a custom binary format in Impala. You should convert your files to csv, then

java.lang.AbstractMethodError, org.apache.spark.internal.Logging$class.initializeLogIfNecessary

阅读更多关于 java.lang.AbstractMethodError, org.apache.spark.internal.Logging$class.initializeLogIfNecessary

I m running the kafka producer and consumer code for testing purpose in cdh 5.12. While I m trying to do so I m facing below error while running the consumer code. dataSet: org.apache.spark.sql.Dataset[(String, String)] = [key: string, value: string] query: org.apache.spark.sql.streaming.StreamingQuery = org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@109a5573 2018-10-25 10:08:37 ERROR MicroBatchExecution:91 - Query [id = 70bc4f7a-cc41-470d-afd0-d46e5aebf3db, runId = 4d974468-6c6b-47e5-976b-8b9aa98114e2] terminated with error java.lang.AbstractMethodError at org.apache.spark

How to check whether the file exist in HDFS location, using oozie?

阅读更多关于 How to check whether the file exist in HDFS location, using oozie?

问题 How to check whether a file in HDFS location is exist or not, using Oozie? In my HDFS location I will get a file like this test_08_01_2016.csv at 11PM , on a daily basis. I want check whether this file exist after 11.15 PM. I can schedule the batch using a Oozie coordinator job. But how can I validate if the file exists in HDFS? 回答1: you can use EL expression in oozie like: <decision name="CheckFile"> <switch> <case to="nextOozieTask"> ${fs:exists('/path/test_08_01_2016.csv')} <!--do note the

How do the hive sql queries are submitted as mr job from hive cli

阅读更多关于 How do the hive sql queries are submitted as mr job from hive cli

I have deployed a CDH-5.9 cluster with MR as hive execution engine. I have a hive table named "users" with 50 rows. Whenever I execute the query select * from users works fine as follows : hive> select * from users; OK Adam 1 38 ATK093 CHEF Benjamin 2 24 ATK032 SERVANT Charles 3 45 ATK107 CASHIER Ivy 4 30 ATK384 SERVANT Linda 5 23 ATK132 ASSISTANT . . . Time taken: 0.059 seconds, Fetched: 50 row(s) But issuing select max(age) from users failed after submitting as mr job. The container log also doesn't have any information to figure it out why its getting failed. hive> select max(age) from

Can I have multiple spark versions installed in CDH?

阅读更多关于 Can I have multiple spark versions installed in CDH?

I'm using cdh5.1.0, which already has default spark installed. However, I want to use Spark 1.3. Can I also install this version to cdh5.1.0? How is it possible to set these up? Will the new version of spark also be monitored via Cloudera manager? Yes, you can run any Apache Spark version you like. Just make sure it's built for the version of YARN you have (2.3 for CDH 5.1.0). You can then run your application as a YARN application with spark-submit . (See http://spark.apache.org/docs/latest/running-on-yarn.html .) It will be monitored like any other YARN application. Spark doesn't need to be

Can I have multiple spark versions installed in CDH?

阅读更多关于 Can I have multiple spark versions installed in CDH?

问题 I'm using cdh5.1.0, which already has default spark installed. However, I want to use Spark 1.3. Can I also install this version to cdh5.1.0? How is it possible to set these up? Will the new version of spark also be monitored via Cloudera manager? 回答1: Yes, you can run any Apache Spark version you like. Just make sure it's built for the version of YARN you have (2.3 for CDH 5.1.0). You can then run your application as a YARN application with spark-submit . (See http://spark.apache.org/docs

Namenode HA (UnknownHostException: nameservice1)

阅读更多关于 Namenode HA (UnknownHostException: nameservice1)

We enable Namenode High Availability through Cloudera Manager, using Cloudera Manager >> HDFS >> Action > Enable High Availability >> Selected Stand By Namenode & Journal Nodes Then nameservice1 Once the whole process completed then Deployed Client Configuration. Tested from Client Machine by listing HDFS directories (hadoop fs -ls /) then manually failover to standby namenode & again listing HDFS directories (hadoop fs -ls /). This test worked perfectly. But When I ran hadoop sleep job using following command it failed $ hadoop jar /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/hadoop-0

Datastax Cassandra Driver throwing CodecNotFoundException

阅读更多关于 Datastax Cassandra Driver throwing CodecNotFoundException

The exact Exception is as follows com.datastax.driver.core.exceptions.CodecNotFoundException: Codec not found for requested operation: [varchar <-> java.math.BigDecimal] These are the versions of Software I am using Spark 1.5 Datastax-cassandra 3.2.1 CDH 5.5.1 The code I am trying to execute is a Spark program using the java api and it basically reads data (csv's) from hdfs and loads it into cassandra tables . I am using the spark-cassandra-connector. I had a lot of issues regarding the google s guava library conflict initially which I was able to resolve by shading the guava library and

Namenode HA (UnknownHostException: nameservice1)

阅读更多关于 Namenode HA (UnknownHostException: nameservice1)

问题 We enable Namenode High Availability through Cloudera Manager, using Cloudera Manager >> HDFS >> Action > Enable High Availability >> Selected Stand By Namenode & Journal Nodes Then nameservice1 Once the whole process completed then Deployed Client Configuration. Tested from Client Machine by listing HDFS directories (hadoop fs -ls /) then manually failover to standby namenode & again listing HDFS directories (hadoop fs -ls /). This test worked perfectly. But When I ran hadoop sleep job using