问题
I would like to know if there is any feature of Spark Dataframe saving, that when it writes data to an Impala table, it also creates that table when this table was not previously created in Impala.
For example, the code:
myDataframe.write.mode(SaveMode.Overwrite).jdbc(jdbcURL, "books", connectionProperties)
should create the table if it doesn't exists.
The table schema should be determined from the dataframe schema.
I look forward for your suggestions/ideas.
Regards, Florin
回答1:
import org.apache.spark.sql.SaveMode
val jdbcURL = s"jdbc:impala://192.168.10.555:21050;AuthMech=0"
val connectionProperties = new java.util.Properties()
sqlContext.sql("select * from temp_table").write.mode(SaveMode.Append).jdbc(jdbcURL, "users", connectionProperties)
Or
df.write.mode("append").jdbc(url="jdbc:impala://192.168.10.555:21050/test;auth=noSasl",table="tempTable", pro)
df.write.mode("overwrite").jdbc(url="jdbc:impala://192.168.10.555:21050/test;auth=noSasl",table="tempTable", pro)
Pass Driver jar with command
spark-shell --driver-class-path
回答2:
I created in the past the table via a mutateStatement.execute with the relevant DDL. I checked with SPARK 2.x and append creates its autiomatically as well. Sp append is all you need.
For JDBC:
jdbcDF.write.mode("append").jdbc(url, table, prop)
For Hive via SPARK 2.x auto hive context:
x.write.mode("append").saveAsTable("a_hive_table_xx")
来源:https://stackoverflow.com/questions/50990540/dataframe-to-automatically-create-impala-table-when-writing-to-impala