hortonworks-data-platform

adding multiple jars in Oozie-Spark action

阅读更多关于 adding multiple jars in Oozie-Spark action

问题 I'm using HDP2.6. where is installed oozie 4.2. and Spark2. After I tracked Hortonworks guide on this site: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_spark-component-guide/content/ch_oozie-spark-action.html for adding libs for Spark2 in 4.2. version of Oozie. After I submit the job with this add-on: oozie.action.sharelib.for.spark=spark2 The error I'm getting is this: 2017-07-19 12:36:53,271 WARN SparkActionExecutor:523 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[Workflow2]

Storm UI throwing “Offset lags for kafka not supported for older versions. Please update kafka spout to latest version.”

阅读更多关于 Storm UI throwing “Offset lags for kafka not supported for older versions. Please update kafka spout to latest version.”

问题 I have upgraded my hdp cluster to 2.5 and upgraded topology dependencies of storm-core to 1.0.1 and storm-kafka to 1.0.1. After deploying the new topology with new 1.0.1 dependencies everything is working as expected in the back end but storm UI not showing always zero for "Acked","Emitted", "Transferred" etc. Storm UI shows a message "Offset lags for kafka not supported for older versions. Please update kafka spout to latest version." under "Topology spouts lag error" what does it mean ? 回答1

Job via Oozie HDP 2.1 not creating job.splitmetainfo

阅读更多关于 Job via Oozie HDP 2.1 not creating job.splitmetainfo

问题 When trying to execute a sqoop job which has my Hadoop program passed as a jar file in -jarFiles parameter, the execution blows off with below error. Any resolution seems to be not available. Other jobs with same Hadoop user is getting executed successfully. org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://sandbox.hortonworks.com:8020/user/root/.staging/job_1423050964699_0003/job.splitmetainfo at org.apache.hadoop.mapreduce.v2

Job via Oozie HDP 2.1 not creating job.splitmetainfo

阅读更多关于 Job via Oozie HDP 2.1 not creating job.splitmetainfo

Sqoop import to HCatalog/Hive - table not visible

阅读更多关于 Sqoop import to HCatalog/Hive - table not visible

问题 HDP-2.4.2.0-258 installed using Ambari 2.2.2.0 I have to import several SQL Server schema which should be accessible via Hive, Pig, MR and any third party(in future). I decided to import in HCatalog. Sqoop provides ways to import to Hive OR HCatalog, I guess if I import to HCatalog, the same table will be accessible from Hive CLI, to MR and to Pig(please evaluate my assumption). Questions : If imported to Hive directly, will the table be available to Pig, MR ? If imported to HCatalog, what

Spark SQL “Limit”

阅读更多关于 Spark SQL “Limit”

问题 Env : spark 1.6 using Hadoop. Hortonworks Data Platform 2.5 I have a table with 10 billion records and I would like to get 300 million records and move them to a temporary table. sqlContext.sql("select ....from my_table limit 300000000").repartition(50) .write.saveAsTable("temporary_table") I saw that the Limit keyword would actually make spark use only one executor!!! This means moving 300 million records to one node and writing it back to Hadoop. How can I avoid this reduce but still get

Kafka console producer Error in Hortonworks HDP 2.3 Sandbox

阅读更多关于 Kafka console producer Error in Hortonworks HDP 2.3 Sandbox

问题 I have searched it all over and couldn't find the error. I have checked This Stackoverflow Issue but it is not the problem with me I have started a zookeeper server Command to start server was bin/zookeeper-server-start.sh config/zookeeper.properties Then I SSH into VM by using Putty and started kafka server using $ bin/kafka-server-start.sh config/server.properties Then I created Kafka Topic and when I list the topic, it appears. Then I opened another putty and started kafka-console-producer

Getting the Tool Interface warning even though it is implemented

阅读更多关于 Getting the Tool Interface warning even though it is implemented

问题 I have a very simple "Hello world" style map/reduce job. public class Tester extends Configured implements Tool { @Override public int run(String[] args) throws Exception { if (args.length != 2) { System.err.printf("Usage: %s [generic options] <input> <output>\n", getClass().getSimpleName()); ToolRunner.printGenericCommandUsage(System.err); return -1; } Job job = Job.getInstance(new Configuration()); job.setJarByClass(getClass()); getConf().set("mapreduce.job.queuename", "adhoc");

Making spark use /etc/hosts file for binding in YARN cluster mode

阅读更多关于 Making spark use /etc/hosts file for binding in YARN cluster mode

问题 Have a spark cluster setup on a machine with two inets, one public another private. The /etc/hosts file in the cluster has the internal ip of all the other machines in the cluster, like so. internal_ip FQDN However when I request a SparkContext via pyspark in YARN client mode( pyspark --master yarn --deploy-mode client ), akka binds onto the public ip and thus a time out occurs. 15/11/07 23:29:23 INFO Remoting: Starting remoting 15/11/07 23:29:23 INFO Remoting: Remoting started; listening on

sqlContext HiveDriver error on SQLException: Method not supported

阅读更多关于 sqlContext HiveDriver error on SQLException: Method not supported

问题 I have been trying to use sqlContext.read.format("jdbc").options(driver="org.apache.hive.jdbc.HiveDriver") to get Hive table into Spark without any success. I have done research and read below: How to connect to remote hive server from spark Spark 1.5.1 not working with hive jdbc 1.2.0 http://belablotski.blogspot.in/2016/01/access-hive-tables-from-spark-using.html I used the latest Hortonworks Sandbox 2.6 and asked the community there the same question: https://community.hortonworks.com