Dataframe to automatically create Impala table when writing to Impala

问题

I would like to know if there is any feature of Spark Dataframe saving, that when it writes data to an Impala table, it also creates that table when this table was not previously created in Impala.

For example, the code:

myDataframe.write.mode(SaveMode.Overwrite).jdbc(jdbcURL, "books", connectionProperties)

should create the table if it doesn't exists.

The table schema should be determined from the dataframe schema.

I look forward for your suggestions/ideas.

Regards, Florin

回答1:

import org.apache.spark.sql.SaveMode

val jdbcURL = s"jdbc:impala://192.168.10.555:21050;AuthMech=0"
val connectionProperties = new java.util.Properties()
sqlContext.sql("select * from temp_table").write.mode(SaveMode.Append).jdbc(jdbcURL, "users", connectionProperties)

df.write.mode("append").jdbc(url="jdbc:impala://192.168.10.555:21050/test;auth=noSasl",table="tempTable", pro)
df.write.mode("overwrite").jdbc(url="jdbc:impala://192.168.10.555:21050/test;auth=noSasl",table="tempTable", pro)

Pass Driver jar with command

spark-shell --driver-class-path

回答2:

I created in the past the table via a mutateStatement.execute with the relevant DDL. I checked with SPARK 2.x and append creates its autiomatically as well. Sp append is all you need.

For JDBC:

jdbcDF.write.mode("append").jdbc(url, table, prop)

For Hive via SPARK 2.x auto hive context:

x.write.mode("append").saveAsTable("a_hive_table_xx")

来源：https://stackoverflow.com/questions/50990540/dataframe-to-automatically-create-impala-table-when-writing-to-impala

标签

apache-spark

apache-spark-sql

impala

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!