hdp

How can I run spark in headless mode in my custom version on HDP?

自古美人都是妖i 提交于 2021-02-19 08:26:32
问题 How can I run spark in headless mode? Currently, I am executing spark on a HDP 2.6.4 (i.e. 2.2 is installed by default) on the cluster. I have downloaded a spark 2.4.1 Scala 2.11 release in headless mode (i.e. no hadoop jars are built in) from https://spark.apache.org/downloads.html. The exact name is: pre-built with scala 2.11 and user provided hadoop Now when trying to run I follow: https://spark.apache.org/docs/latest/hadoop-provided.html export SPARK_DIST_CLASSPATH=$(hadoop classpath)

Python program to connect to HBase via thrift server in Http mode

孤街醉人 提交于 2021-02-08 13:10:15
问题 I am trying to write a simple program to connect to HBase server through thrift which is started in Http mode.(cluster is kerberized ) but I always gets 'read zero bytes error message' I have refered below links but those examples work only if the thrift server starts in Binary mode (??) https://github.com/joshelser/hbase-thrift1-python-sasl/blob/master/get_row.py, I did Klist and Kinit everything looks fine and also I have followed below HDP documentation and my setup is correct https:/

Spark + Hive : Number of partitions scanned exceeds limit (=4000)

有些话、适合烂在心里 提交于 2021-02-07 11:03:50
问题 We upgraded our Hadoop Platform (Spark; 2.3.0, Hive: 3.1), and I'm facing this exception when reading some Hive tables in Spark : "Number of partitions scanned on table 'my_table' exceeds limit (=4000)". Tables we are working on : table1 : external table with a total of ~12300 partitions, partitioned by(col1: String, date1: String) , (ORC compressed ZLIB) table2 : external table with a total of 4585 partitions, partitioned by(col21: String, date2: Date, col22: String) (ORC uncompressed) [A]

Ambari HDP throwing FileNotFoundException for mapreduce.tar.gz while submitting the mapreduce job

末鹿安然 提交于 2020-06-27 17:20:05
问题 After installing a new Hadoop cluster using Ambari, I tried to submit a mapreduce job, but it failed throwing an error. Error: java.io.FileNotFoundException: File does not exist: hdfs://xx-xx-xxx-x:8020/hdp/apps/2.2.9.0-3393/mapreduce/mapreduce.tar.gz 回答1: Issue resolved after restarting all components from Ambari UI. 来源: https://stackoverflow.com/questions/36687032/ambari-hdp-throwing-filenotfoundexception-for-mapreduce-tar-gz-while-submitting

SaveAsTable in Spark Scala: HDP3.x

不羁岁月 提交于 2020-05-17 06:08:08
问题 I have one dataframe in Spark I'm saving it in my hive as a table.But getting below error message. java.lang.RuntimeException: com.hortonworks.spark.sql.hive.llap.HiveWarehouseConnector does not allow create table as select.at scala.sys.package$.error(package.scala:27) can anyone please help me how should i save this as table in hive. val df3 = df1.join(df2, df1("inv_num") === df2("inv_num") // Join both dataframes on id column ).withColumn("finalSalary", when(df1("salary") < df2("salary"),