cloudera-cdh

Error while inserting from Hive to Hbase

本小妞迷上赌 提交于 2019-12-13 04:54:41
问题 I am using CDH 4.7.1 cluster. Map seems completed 100% and failing the reduce part. I have added the below part to hive-site.xml. Actual error message is pasted in the last part of this post. Thanks. Any help is appreciated. <property> <name>hive.aux.jars.path</name> <value> file:///opt/cloudera/parcels/CDH/lib/hbase/hbase.jar, file:///opt/cloudera/parcels/CDH-4.7.1-1.cdh4.7.1.p0.47/lib/hive/lib/hive-hbase-handler-0.10.0-cdh4.7.1.jar, file:///opt/cloudera/parcels/CDH-4.7.1-1.cdh4.7.1.p0.47

Get an “IOException: Broken pipe” during submiting a spark job which is connecting hbase by pyspark code

感情迁移 提交于 2019-12-13 04:34:16
问题 I submit a spark job to do some easy stuff by pyspark newAPIHadoopRDD, which will connecting hbase during the job running. Our CHD enable the kerberos, But I think I have pass the authentication. I will show my code, shell, exception and some CM config. > "19/01/16 10:55:42 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x36850456cea05e5 19/01/16 10:55:42 INFO zookeeper.ZooKeeper: Session: 0x36850456cea05e5 closed Traceback (most recent call last): File "

How to create External Table on Hive from data on local disk instead of HDFS?

倾然丶 夕夏残阳落幕 提交于 2019-12-12 17:25:40
问题 For data on HDFS, we can do CREATE EXTERNAL TABLE <table> { id INT, name STRING, age INT } LOCATION 'hdfs_path'; But how to specify a local path for the LOCATION above? Thanks. 回答1: You can upload the file to HDFS first using "hdfs dfs -put " and then create Hive external table on top of that. The reason that Hive cannot create external table on local file is because when Hive processes data, the actual processing happens on the Hadoop cluster where your local file may not be accessible at

Spark 2.x + Tika: java.lang.NoSuchMethodError: org.apache.commons.compress.archivers.ArchiveStreamFactory.detect

假装没事ソ 提交于 2019-12-12 12:25:32
问题 I am trying to resolve a spark-submit classpath runtime issue for an Apache Tika (>v 1.14) parsing job. The problem seems to involve spark-submit classpath vs my uber-jar. Platforms: CDH 5.15 (Spark 2.3 added via CDH docs) and CDH 6 (Spark 2.2 bundled in CDH 6) I've tried / reviewed: (Cloudera) Where does spark-submit look for Jar files? (stackoverflow) resolving-dependency-problems-in-apache-spark (stackoverflow) Apache Tika ArchiveStreamFactory.detect error Highlights: Java 8 / Scala 2.11 I

Is it possible to concat a string field after group by in Hive

我只是一个虾纸丫 提交于 2019-12-12 10:37:29
问题 I am evaluating Hive and need to do some string field concatenation after group by. I found a function named "concat_ws" but it looks like I have to explicitly list all the values to be concatenated. I am wondering if I can do something like this with concat_ws in Hive. Here is an example. So I have a table named "my_table" and it has two fields named country and city. I want to have only one record per country and each record will have two fields - country and cities: select country, concat

Incorrect configuration: namenode address dfs.namenode.rpc-address is not configured

那年仲夏 提交于 2019-12-12 07:13:25
问题 I am getting this error when I try and boot up a DataNode. From what I have read, the RPC paramters are only used for a HA configuration, which I am not setting up (I think). 2014-05-18 18:05:00,589 INFO [main] impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - DataNode metrics system shutdown complete. 2014-05-18 18:05:00,589 INFO [main] datanode.DataNode (DataNode.java:shutdown(1313)) - Shutdown complete. 2014-05-18 18:05:00,614 FATAL [main] datanode.DataNode (DataNode.java

How to resolve load main class MahoutDriver error on Twenty Newsgroups Classification Example

社会主义新天地 提交于 2019-12-12 04:49:16
问题 I am trying to run the 2newsgroup classification example in Mahout. I have set MAHOUT_LOCAL=true, the classifier doesn't display the Confusion matrix and gives the following warnings : ok. You chose 2 and we'll use naivebayes creating work directory at /tmp/mahout-work-cloudera + echo 'Preparing 20newsgroups data' Preparing 20newsgroups data + rm -rf /tmp/mahout-work-cloudera/20news-all + mkdir /tmp/mahout-work-cloudera/20news-all + cp -R /tmp/mahout-work-cloudera/20news-bydate/20news-bydate

restart jobtracker through cloudera manager API

爱⌒轻易说出口 提交于 2019-12-12 04:06:43
问题 I am trying to restart Mapreduce Jobtracker through Cloudera Manager API. Stats for Jobtracker is as follows : local-iMac-399:$ curl -u 'admin:admin' 'http://hadoop-namenode.dev.com:7180/api/v6/clusters/Cluster%201/services/mapreduce/roles/mapreduce-JOBTRACKER-0675ebab2b87e3869e0d90167cf4bf86' { "name" : "mapreduce-JOBTRACKER-0675ebab2b87e3869e0d90167cf4bf86", "type" : "JOBTRACKER", "serviceRef" : { "clusterName" : "cluster", "serviceName" : "mapreduce" }, "hostRef" : { "hostId" : "24259373

Native Impala UDF (Cpp) randomly gives result as NULL for same inputs in the same table for multiple invocations in same query

烈酒焚心 提交于 2019-12-11 19:23:21
问题 I have a Native Impala UDF (Cpp) with two functions Both functions are complimentary to each other. String myUDF(BigInt) BigInt myUDFReverso(String) myUDF("myInput") gives some output which when myUDFReverso(myUDF("myInput")) should give back myInput When I run a impala query on a parquet table like this, select column1,myUDF(column1),length(myUDF(column1)),myUDFreverso(myUDF(column1)) from my_parquet_table order by column1 LIMIT 10; The output is NULL at random. The output is say at 1st run

TimeStamp issue in hive 1.1

三世轮回 提交于 2019-12-11 14:21:46
问题 I am facing a very weird issue in hive in production environment(cloudera 5.5) which is basically not reproducible in my local server(Don't know why) i.e. for some records I am having wrong timestamp value while inserting from temp table to main table as String "2017-10-21 23" is converted into timestamp "2017-10-21 23:00:00" datatype while insertion. example:: 2017-10-21 23 -> 2017-10-21 22:00:00 2017-10-22 15 -> 2017-10-22 14:00:00 It is happening very very infrequent. Means delta value is