hortonworks-data-platform

Spark on YARN too less vcores used

阅读更多关于 Spark on YARN too less vcores used

问题 I'm using Spark in a YARN cluster (HDP 2.4) with the following settings: 1 Masternode 64 GB RAM (50 GB usable) 24 cores (19 cores usable) 5 Slavenodes 64 GB RAM (50 GB usable) each 24 cores (19 cores usable) each YARN settings memory of all containers (of one host): 50 GB minimum container size = 2 GB maximum container size = 50 GB vcores = 19 minimum #vcores/container = 1 maximum #vcores/container = 19 When I run my spark application with the command spark-submit --num-executors 30 -

Requests hang when using Hiveserver2 Thrift Java client

阅读更多关于 Requests hang when using Hiveserver2 Thrift Java client

问题 This is a follow up question to this question where I ask what the Hiveserver 2 thrift java client API is. This question should be able to stand along without that background if you don't need any more context. Unable to find any documentation on how to use the hiverserver2 thrift api, I put this together. The best reference I could find was the Apache JDBC implementation. TSocket transport = new TSocket("hive.example.com", 10002); transport.setTimeout(999999999); TBinaryProtocol protocol =

Sqoop import : composite primary key and textual primary key

阅读更多关于 Sqoop import : composite primary key and textual primary key

Stack : Installed HDP-2.3.2.0-2950 using Ambari 2.1 The source DB schema is on sql server and it contains several tables which either have primary key as : A varchar Composite - two varchar columns or one varchar + one int column or two int columns. There is a large table with ? rows which has three columns in the PK one int + two varchar columns As per the Sqoop documentation : Sqoop cannot currently split on multi-column indices. If your table has no index column, or has a multi-column key, then you must also manually choose a splitting column. The first question is : What is expected by

Sqoop import : composite primary key and textual primary key

阅读更多关于 Sqoop import : composite primary key and textual primary key

问题 Stack : Installed HDP-2.3.2.0-2950 using Ambari 2.1 The source DB schema is on sql server and it contains several tables which either have primary key as : A varchar Composite - two varchar columns or one varchar + one int column or two int columns. There is a large table with ? rows which has three columns in the PK one int + two varchar columns As per the Sqoop documentation : Sqoop cannot currently split on multi-column indices. If your table has no index column, or has a multi-column key,

ERROR 1066: Unable to open iterator for alias in Pig, Generic solution

阅读更多关于 ERROR 1066: Unable to open iterator for alias in Pig, Generic solution

问题 A very common, error message in Apache Pig is: ERROR 1066: Unable to open iterator for alias There are several questions where this error is mentioned, but none of them give a generic approach for dealing with it. Hence this question: What to do when you get an ERROR 1066: Unable to open iterator for alias ? 回答1: The message "ERROR 1066: Unable to open iterator for alias myAlias" suggests that there is something going wrong in the line where you use myAlias. However, usually you will see this