cloudera-cdh

Loading JSON file with serde in Cloudera

百般思念 提交于 2019-12-31 07:18:15
问题 I am trying to work with a JSON file with this bag structure : { "user_id": "kim95", "type": "Book", "title": "Modern Database Systems: The Object Model, Interoperability, and Beyond.", "year": "1995", "publisher": "ACM Press and Addison-Wesley", "authors": [ { "name": "null" } ], "source": "DBLP" } { "user_id": "marshallo79", "type": "Book", "title": "Inequalities: Theory of Majorization and Its Application.", "year": "1979", "publisher": "Academic Press", "authors": [ { "name": "Albert W.

Loading JSON file with serde in Cloudera

我的梦境 提交于 2019-12-31 07:18:12
问题 I am trying to work with a JSON file with this bag structure : { "user_id": "kim95", "type": "Book", "title": "Modern Database Systems: The Object Model, Interoperability, and Beyond.", "year": "1995", "publisher": "ACM Press and Addison-Wesley", "authors": [ { "name": "null" } ], "source": "DBLP" } { "user_id": "marshallo79", "type": "Book", "title": "Inequalities: Theory of Majorization and Its Application.", "year": "1979", "publisher": "Academic Press", "authors": [ { "name": "Albert W.

Flume - TwitterSource language filter

穿精又带淫゛_ 提交于 2019-12-24 13:08:16
问题 I would like to ask your help in the following case. I'm currently using Cloudera CDH 5.1.2 and I tried to collect Twitter data using Flume as it is described in the following porsts (Cloudera): http://blog.cloudera.com/blog/2012/10/analyzing-twitter-data-with-hadoop-part-2-gathering-data-with-flume/ github.com/cloudera/cdh-twitter-example I downloaded the source and rebuilt the flume-sources after updating the versions in pom.xml: <flume.version>1.5.0-cdh5.1.2</flume.version> <hadoop.version

Oozie - Task Logs Do not Display

限于喜欢 提交于 2019-12-24 11:34:35
问题 Using CDH 5, when I run my oozie workflow I no longer see log-statements from my mappers (log4j, slf4j). I even tried System.out.println - I still don't see the statements. Is there a setting I'm missing? 回答1: It turned out that the logs are still there except you need to manually point your browser to it. For example, clicking on a map-reduce action still opens the job log page something like (http://localhost:50030/jobdetails.jsp?jobid=job_201510061631_2112). However to get the result for

Kafka on Cloudera - test=TOPIC_AUTHORIZATION_FAILED

北城余情 提交于 2019-12-24 10:35:17
问题 We just upgraded from CDH 5.3.6 to 5.10.0, and started getting errors when trying to write to Kafka topics. We have the default settings on everything, no SSL or Kerberos authentication enabled. When use the console producer to write to one of my topics, I get this error: /usr/bin/kafka-console-producer --broker-list=myhost1.dev.com:9092,myhost2.dev.com:9092 --topic test 17/03/06 21:00:57 INFO utils.AppInfoParser: Kafka version : 0.10.0-kafka-2.1.0 17/03/06 21:00:57 INFO utils.AppInfoParser:

My cdh5.2 cluster get FileNotFoundException when running hbase MR jobs

纵饮孤独 提交于 2019-12-22 22:25:09
问题 My cdh5.2 cluster has a problem to run hbase MR jobs. For example, I added the hbase classpath into the hadoop classpath: vi /etc/hadoop/conf/hadoop-env.sh add the line: export HADOOP_CLASSPATH="/usr/lib/hbase/bin/hbase classpath:$HADOOP_CLASSPATH" And when I am running: hadoop jar /usr/lib/hbase/hbase-server-0.98.6-cdh5.2.1.jar rowcounter "mytable" I get the following exception: 14/12/09 03:44:02 WARN security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:java

spark-submit yarn-cluster with --jars does not work?

99封情书 提交于 2019-12-22 09:45:43
问题 I am trying to submit a spark job to the CDH yarn cluster via the following commands I have tried several combinations and it all does not work... I now have all the poi jars located in both my local /root, as well as HDFS /user/root/lib, hence I have tried the following spark-submit --master yarn-cluster --class "ReadExcelSC" ./excel_sc.jar --jars /root/poi-3.12.jars, /root/poi-ooxml-3.12.jar, /root/poi-ooxml-schemas-3.12.jar spark-submit --master yarn-cluster --class "ReadExcelSC" ./excel

Exclusion of dependency of spark-core in CDH

ⅰ亾dé卋堺 提交于 2019-12-20 05:59:28
问题 I'm using Structured Spark Streaming to write to HBase data coming from Kafka. My cluster distribution is : Hadoop 3.0.0-cdh6.2.0, and i'm using Spark 2.4.0 My code is like below : val df = spark .readStream .format("kafka") .option("kafka.bootstrap.servers", bootstrapServers) .option("subscribe", topic) .option("failOnDataLoss", false) .load() .selectExpr("CAST(key AS STRING)" , "CAST(value AS STRING)") .as(Encoders.STRING) df.writeStream .foreachBatch { (batchDF: Dataset[Row], batchId: Long

CDH Community Edition Rolling Upgrade from 5.7 to 5.13

狂风中的少年 提交于 2019-12-13 17:42:58
问题 Can someone let me know how can I perform Rolling Upgrade of CDH from 5.7 to 5.13? I could not find much in Cloudera documentation regarding rolling upgrade of CDH Community Edition? EDIT As per the discussion below I can do upgrades manually stopping, upgrading (via "1 Click Install" )and starting the nodes. In a cluster like below 3 Hbase Master (1 Active & 2 standby) 4 Region Servers 4 Data Nodes 1 Primary & 1 Secondary Name Node 3 Journal Node 4 Nodemanager 3 Resource Manager (1 Active &

Where are the start/stop hadoop hdfs/mapred scripts on CDH5

此生再无相见时 提交于 2019-12-13 16:59:05
问题 The documentation for CDH4 refers to the /etc/init.d/hadoop-* scripts, but these no longer exist in CDH5. I have waded into the documentation but was not successful in finding/understanding what is the CDH5 equivalent. Closest I could find was for the SCM manager: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-Administration-Guide/cm5ag_agents.html Pointers to and explanation of the new process(/es) would be appreciated. 回答1: I received a response