cloudera-cdh | 易学教程

Error while inserting from Hive to Hbase

阅读更多关于 Error while inserting from Hive to Hbase

问题 I am using CDH 4.7.1 cluster. Map seems completed 100% and failing the reduce part. I have added the below part to hive-site.xml. Actual error message is pasted in the last part of this post. Thanks. Any help is appreciated. <property> <name>hive.aux.jars.path</name> <value> file:///opt/cloudera/parcels/CDH/lib/hbase/hbase.jar, file:///opt/cloudera/parcels/CDH-4.7.1-1.cdh4.7.1.p0.47/lib/hive/lib/hive-hbase-handler-0.10.0-cdh4.7.1.jar, file:///opt/cloudera/parcels/CDH-4.7.1-1.cdh4.7.1.p0.47

Get an “IOException: Broken pipe” during submiting a spark job which is connecting hbase by pyspark code

阅读更多关于 Get an “IOException: Broken pipe” during submiting a spark job which is connecting hbase by pyspark code

问题 I submit a spark job to do some easy stuff by pyspark newAPIHadoopRDD, which will connecting hbase during the job running. Our CHD enable the kerberos, But I think I have pass the authentication. I will show my code, shell, exception and some CM config. > "19/01/16 10:55:42 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x36850456cea05e5 19/01/16 10:55:42 INFO zookeeper.ZooKeeper: Session: 0x36850456cea05e5 closed Traceback (most recent call last): File "

How to create External Table on Hive from data on local disk instead of HDFS?

阅读更多关于 How to create External Table on Hive from data on local disk instead of HDFS?

问题 For data on HDFS, we can do CREATE EXTERNAL TABLE <table> { id INT, name STRING, age INT } LOCATION 'hdfs_path'; But how to specify a local path for the LOCATION above? Thanks. 回答1: You can upload the file to HDFS first using "hdfs dfs -put " and then create Hive external table on top of that. The reason that Hive cannot create external table on local file is because when Hive processes data, the actual processing happens on the Hadoop cluster where your local file may not be accessible at

Spark 2.x + Tika: java.lang.NoSuchMethodError: org.apache.commons.compress.archivers.ArchiveStreamFactory.detect

阅读更多关于 Spark 2.x + Tika: java.lang.NoSuchMethodError: org.apache.commons.compress.archivers.ArchiveStreamFactory.detect

问题 I am trying to resolve a spark-submit classpath runtime issue for an Apache Tika (>v 1.14) parsing job. The problem seems to involve spark-submit classpath vs my uber-jar. Platforms: CDH 5.15 (Spark 2.3 added via CDH docs) and CDH 6 (Spark 2.2 bundled in CDH 6) I've tried / reviewed: (Cloudera) Where does spark-submit look for Jar files? (stackoverflow) resolving-dependency-problems-in-apache-spark (stackoverflow) Apache Tika ArchiveStreamFactory.detect error Highlights: Java 8 / Scala 2.11 I

Is it possible to concat a string field after group by in Hive

阅读更多关于 Is it possible to concat a string field after group by in Hive

问题 I am evaluating Hive and need to do some string field concatenation after group by. I found a function named "concat_ws" but it looks like I have to explicitly list all the values to be concatenated. I am wondering if I can do something like this with concat_ws in Hive. Here is an example. So I have a table named "my_table" and it has two fields named country and city. I want to have only one record per country and each record will have two fields - country and cities: select country, concat

Incorrect configuration: namenode address dfs.namenode.rpc-address is not configured

阅读更多关于 Incorrect configuration: namenode address dfs.namenode.rpc-address is not configured

问题 I am getting this error when I try and boot up a DataNode. From what I have read, the RPC paramters are only used for a HA configuration, which I am not setting up (I think). 2014-05-18 18:05:00,589 INFO [main] impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(572)) - DataNode metrics system shutdown complete. 2014-05-18 18:05:00,589 INFO [main] datanode.DataNode (DataNode.java:shutdown(1313)) - Shutdown complete. 2014-05-18 18:05:00,614 FATAL [main] datanode.DataNode (DataNode.java

How to resolve load main class MahoutDriver error on Twenty Newsgroups Classification Example

阅读更多关于 How to resolve load main class MahoutDriver error on Twenty Newsgroups Classification Example

问题 I am trying to run the 2newsgroup classification example in Mahout. I have set MAHOUT_LOCAL=true, the classifier doesn't display the Confusion matrix and gives the following warnings : ok. You chose 2 and we'll use naivebayes creating work directory at /tmp/mahout-work-cloudera + echo 'Preparing 20newsgroups data' Preparing 20newsgroups data + rm -rf /tmp/mahout-work-cloudera/20news-all + mkdir /tmp/mahout-work-cloudera/20news-all + cp -R /tmp/mahout-work-cloudera/20news-bydate/20news-bydate

restart jobtracker through cloudera manager API

阅读更多关于 restart jobtracker through cloudera manager API

问题 I am trying to restart Mapreduce Jobtracker through Cloudera Manager API. Stats for Jobtracker is as follows : local-iMac-399:$ curl -u 'admin:admin' 'http://hadoop-namenode.dev.com:7180/api/v6/clusters/Cluster%201/services/mapreduce/roles/mapreduce-JOBTRACKER-0675ebab2b87e3869e0d90167cf4bf86' { "name" : "mapreduce-JOBTRACKER-0675ebab2b87e3869e0d90167cf4bf86", "type" : "JOBTRACKER", "serviceRef" : { "clusterName" : "cluster", "serviceName" : "mapreduce" }, "hostRef" : { "hostId" : "24259373

Native Impala UDF (Cpp) randomly gives result as NULL for same inputs in the same table for multiple invocations in same query

阅读更多关于 Native Impala UDF (Cpp) randomly gives result as NULL for same inputs in the same table for multiple invocations in same query

问题 I have a Native Impala UDF (Cpp) with two functions Both functions are complimentary to each other. String myUDF(BigInt) BigInt myUDFReverso(String) myUDF("myInput") gives some output which when myUDFReverso(myUDF("myInput")) should give back myInput When I run a impala query on a parquet table like this, select column1,myUDF(column1),length(myUDF(column1)),myUDFreverso(myUDF(column1)) from my_parquet_table order by column1 LIMIT 10; The output is NULL at random. The output is say at 1st run

TimeStamp issue in hive 1.1

阅读更多关于 TimeStamp issue in hive 1.1

问题 I am facing a very weird issue in hive in production environment(cloudera 5.5) which is basically not reproducible in my local server(Don't know why) i.e. for some records I am having wrong timestamp value while inserting from temp table to main table as String "2017-10-21 23" is converted into timestamp "2017-10-21 23:00:00" datatype while insertion. example:: 2017-10-21 23 -> 2017-10-21 22:00:00 2017-10-22 15 -> 2017-10-22 14:00:00 It is happening very very infrequent. Means delta value is