cloudera-cdh | 易学教程

Loading JSON file with serde in Cloudera

阅读更多关于 Loading JSON file with serde in Cloudera

问题 I am trying to work with a JSON file with this bag structure : { "user_id": "kim95", "type": "Book", "title": "Modern Database Systems: The Object Model, Interoperability, and Beyond.", "year": "1995", "publisher": "ACM Press and Addison-Wesley", "authors": [ { "name": "null" } ], "source": "DBLP" } { "user_id": "marshallo79", "type": "Book", "title": "Inequalities: Theory of Majorization and Its Application.", "year": "1979", "publisher": "Academic Press", "authors": [ { "name": "Albert W.

Loading JSON file with serde in Cloudera

阅读更多关于 Loading JSON file with serde in Cloudera

Flume - TwitterSource language filter

阅读更多关于 Flume - TwitterSource language filter

问题 I would like to ask your help in the following case. I'm currently using Cloudera CDH 5.1.2 and I tried to collect Twitter data using Flume as it is described in the following porsts (Cloudera): http://blog.cloudera.com/blog/2012/10/analyzing-twitter-data-with-hadoop-part-2-gathering-data-with-flume/ github.com/cloudera/cdh-twitter-example I downloaded the source and rebuilt the flume-sources after updating the versions in pom.xml: <flume.version>1.5.0-cdh5.1.2</flume.version> <hadoop.version

Oozie - Task Logs Do not Display

阅读更多关于 Oozie - Task Logs Do not Display

问题 Using CDH 5, when I run my oozie workflow I no longer see log-statements from my mappers (log4j, slf4j). I even tried System.out.println - I still don't see the statements. Is there a setting I'm missing? 回答1: It turned out that the logs are still there except you need to manually point your browser to it. For example, clicking on a map-reduce action still opens the job log page something like (http://localhost:50030/jobdetails.jsp?jobid=job_201510061631_2112). However to get the result for

Kafka on Cloudera - test=TOPIC_AUTHORIZATION_FAILED

阅读更多关于 Kafka on Cloudera - test=TOPIC_AUTHORIZATION_FAILED

问题 We just upgraded from CDH 5.3.6 to 5.10.0, and started getting errors when trying to write to Kafka topics. We have the default settings on everything, no SSL or Kerberos authentication enabled. When use the console producer to write to one of my topics, I get this error: /usr/bin/kafka-console-producer --broker-list=myhost1.dev.com:9092,myhost2.dev.com:9092 --topic test 17/03/06 21:00:57 INFO utils.AppInfoParser: Kafka version : 0.10.0-kafka-2.1.0 17/03/06 21:00:57 INFO utils.AppInfoParser:

My cdh5.2 cluster get FileNotFoundException when running hbase MR jobs

阅读更多关于 My cdh5.2 cluster get FileNotFoundException when running hbase MR jobs

问题 My cdh5.2 cluster has a problem to run hbase MR jobs. For example, I added the hbase classpath into the hadoop classpath: vi /etc/hadoop/conf/hadoop-env.sh add the line: export HADOOP_CLASSPATH="/usr/lib/hbase/bin/hbase classpath:$HADOOP_CLASSPATH" And when I am running: hadoop jar /usr/lib/hbase/hbase-server-0.98.6-cdh5.2.1.jar rowcounter "mytable" I get the following exception: 14/12/09 03:44:02 WARN security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:java

spark-submit yarn-cluster with --jars does not work?

阅读更多关于 spark-submit yarn-cluster with --jars does not work?

问题 I am trying to submit a spark job to the CDH yarn cluster via the following commands I have tried several combinations and it all does not work... I now have all the poi jars located in both my local /root, as well as HDFS /user/root/lib, hence I have tried the following spark-submit --master yarn-cluster --class "ReadExcelSC" ./excel_sc.jar --jars /root/poi-3.12.jars, /root/poi-ooxml-3.12.jar, /root/poi-ooxml-schemas-3.12.jar spark-submit --master yarn-cluster --class "ReadExcelSC" ./excel

Exclusion of dependency of spark-core in CDH

阅读更多关于 Exclusion of dependency of spark-core in CDH

问题 I'm using Structured Spark Streaming to write to HBase data coming from Kafka. My cluster distribution is : Hadoop 3.0.0-cdh6.2.0, and i'm using Spark 2.4.0 My code is like below : val df = spark .readStream .format("kafka") .option("kafka.bootstrap.servers", bootstrapServers) .option("subscribe", topic) .option("failOnDataLoss", false) .load() .selectExpr("CAST(key AS STRING)" , "CAST(value AS STRING)") .as(Encoders.STRING) df.writeStream .foreachBatch { (batchDF: Dataset[Row], batchId: Long

CDH Community Edition Rolling Upgrade from 5.7 to 5.13

阅读更多关于 CDH Community Edition Rolling Upgrade from 5.7 to 5.13

问题 Can someone let me know how can I perform Rolling Upgrade of CDH from 5.7 to 5.13? I could not find much in Cloudera documentation regarding rolling upgrade of CDH Community Edition? EDIT As per the discussion below I can do upgrades manually stopping, upgrading (via "1 Click Install" )and starting the nodes. In a cluster like below 3 Hbase Master (1 Active & 2 standby) 4 Region Servers 4 Data Nodes 1 Primary & 1 Secondary Name Node 3 Journal Node 4 Nodemanager 3 Resource Manager (1 Active &

Where are the start/stop hadoop hdfs/mapred scripts on CDH5

阅读更多关于 Where are the start/stop hadoop hdfs/mapred scripts on CDH5

问题 The documentation for CDH4 refers to the /etc/init.d/hadoop-* scripts, but these no longer exist in CDH5. I have waded into the documentation but was not successful in finding/understanding what is the CDH5 equivalent. Closest I could find was for the SCM manager: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM5/latest/Cloudera-Manager-Administration-Guide/cm5ag_agents.html Pointers to and explanation of the new process(/es) would be appreciated. 回答1: I received a response