cloudera-cdh

Datastax Cassandra Driver throwing CodecNotFoundException

江枫思渺然 提交于 2019-12-01 01:03:18
问题 The exact Exception is as follows com.datastax.driver.core.exceptions.CodecNotFoundException: Codec not found for requested operation: [varchar <-> java.math.BigDecimal] These are the versions of Software I am using Spark 1.5 Datastax-cassandra 3.2.1 CDH 5.5.1 The code I am trying to execute is a Spark program using the java api and it basically reads data (csv's) from hdfs and loads it into cassandra tables . I am using the spark-cassandra-connector. I had a lot of issues regarding the

Incorrect configuration: namenode address dfs.namenode.rpc-address is not configured

白昼怎懂夜的黑 提交于 2019-11-30 09:04:08
I am getting this error when I try and boot up a DataNode. From what I have read, the RPC paramters are only used for a HA configuration, which I am not setting up (I think). 2014-05-18 18:05:00,589 INFO [main] impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - DataNode metrics system shutdown complete. 2014-05-18 18:05:00,589 INFO [main] datanode.DataNode (DataNode.java:shutdown(1313)) - Shutdown complete. 2014-05-18 18:05:00,614 FATAL [main] datanode.DataNode (DataNode.java:secureMain(1989)) - Exception in secureMain java.io.IOException: Incorrect configuration: namenode address

Pig : result of json loader empty

你离开我真会死。 提交于 2019-11-29 16:23:53
I'm using cdh5 quickstart vm and I have a file like this(not full here): {"user_id": "kim95", "type": "Book", "title": "Modern Database Systems: The Object Model, Interoperability, and Beyond.", "year": "1995", "publisher": "ACM Press and Addison-Wesley", "authors": {}, "source": "DBLP" } {"user_id": "marshallo79", "type": "Book", "title": "Inequalities: Theory of Majorization and Its Application.", "year": "1979", "publisher": "Academic Press", "authors": {("Albert W. Marshall"), ("Ingram Olkin")}, "source": "DBLP" } and I used this script: books = load 'data/book-seded.json' using JsonLoader

How to check the Spark version

我只是一个虾纸丫 提交于 2019-11-29 10:34:24
问题 as titled, how do I know which version of spark has been installed in the CentOS? The current system has installed cdh5.1.0. 回答1: If you use Spark-Shell, it appears in the banner at the start. Programatically, SparkContext.version can be used. 回答2: Open Spark shell Terminal, run sc.version 回答3: You can use spark-submit command: spark-submit --version 回答4: In Spark 2.x program/shell, use the spark.version Where spark variable is of SparkSession object Using the console logs at start of spark

How do I set an environment variable in a YARN Spark job?

…衆ロ難τιáo~ 提交于 2019-11-29 08:03:33
I'm attempting to access Accumulo 1.6 from an Apache Spark job (written in Java) by using an AccumuloInputFormat with newAPIHadoopRDD . In order to do this, I have to tell the AccumuloInputFormat where to locate ZooKeeper by calling the setZooKeeperInstance method. This method takes a ClientConfiguration object which specifies various relevant properties. I'm creating my ClientConfiguration object by calling the static loadDefault method. This method is supposed to look in various places for a client.conf file to load its defaults from. One of the places it's supposed to look is $ACCUMULO_CONF

Pig : result of json loader empty

会有一股神秘感。 提交于 2019-11-28 09:24:27
问题 I'm using cdh5 quickstart vm and I have a file like this(not full here): {"user_id": "kim95", "type": "Book", "title": "Modern Database Systems: The Object Model, Interoperability, and Beyond.", "year": "1995", "publisher": "ACM Press and Addison-Wesley", "authors": {}, "source": "DBLP" } {"user_id": "marshallo79", "type": "Book", "title": "Inequalities: Theory of Majorization and Its Application.", "year": "1979", "publisher": "Academic Press", "authors": {("Albert W. Marshall"), ("Ingram

How do I set an environment variable in a YARN Spark job?

ε祈祈猫儿з 提交于 2019-11-28 02:03:35
问题 I'm attempting to access Accumulo 1.6 from an Apache Spark job (written in Java) by using an AccumuloInputFormat with newAPIHadoopRDD . In order to do this, I have to tell the AccumuloInputFormat where to locate ZooKeeper by calling the setZooKeeperInstance method. This method takes a ClientConfiguration object which specifies various relevant properties. I'm creating my ClientConfiguration object by calling the static loadDefault method. This method is supposed to look in various places for

Class com.hadoop.compression.lzo.LzoCodec not found for Spark on CDH 5?

☆樱花仙子☆ 提交于 2019-11-27 21:33:18
I have been working on this problem for two days and still have not find the way. Problem : Our Spark installed via newest CDH 5 always complains about the lost of LzoCodec class, even after I install the HADOOP_LZO through Parcels in cloudera manager. We are running MR1 on CDH 5.0.0-1.cdh5.0.0.p0.47 . Try to fix : The configurations in official CDH documentation about 'Using the LZO Parcel ' are also added but the problem is still there. Most of the googled posts give similar advices to the above. I also suspect that the spark is trying to run against YARN that is not activated there; but I

Spark : how to run spark file from spark shell

谁说胖子不能爱 提交于 2019-11-27 06:16:09
I am using CDH 5.2. I am able to use spark-shell to run the commands. How can I run the file(file.spark) which contain spark commands. Is there any way to run/compile the scala programs in CDH 5.2 without sbt? Thanks in advance To load an external file from spark-shell simply do :load PATH_TO_FILE This will call everything in your file. I don't have a solution for your SBT question though sorry :-) Ziyao Li In command line, you can use spark-shell -i file.scala to run code which is written in file.scala javadba You can use either sbt or maven to compile spark programs. Simply add the spark as

Error in Hive Query while joining tables

不羁岁月 提交于 2019-11-27 01:40:54
问题 I am unable to pass the equality check using the below HIVE query. I have 3 table and i want to join these table. I trying as below, but get error : FAILED: Error in semantic analysis: Line 3:40 Both left and right aliases encountered in JOIN 'visit_date' select t1.*, t99.* from table1 t1 JOIN (select v3.*, t3.* from table2 v3 JOIN table3 t3 ON ( v3.AS_upc= t3.upc_no AND v3.start_dt <= t3.visit_date AND v3.end_dt >= t3.visit_date AND v3.adv_price <= t3.comp_price ) ) t99 ON (t1.comp_store_id