spark Dataframe execute UPDATE statement

ⅰ亾dé卋堺 提交于 2019-12-10 22:10:45

问题


Hy guys,

I need to perform jdbc operation using Apache Spark DataFrame. Basically I have an historical jdbc table called Measures where I have to do two operations:

1. Set endTime validity attribute of the old measure record to the current time

2. Insert a new measure record setting endTime to 9999-12-31

Can someone tell me how to perform (if we can) update statement for the first operation and insert for the second operation?

I tried to use this statement for the first operation:

val dfWriter = df.write.mode(SaveMode.Overwrite)
dfWriter.jdbc("jdbc:postgresql:postgres", tableName, prop)

But it doesn't work because there is a duplicate key violation. If we can do update, how we can do delete statement?

Thanks in advance.


回答1:


I don't think its supported out of the box yet by Spark. What you can do it iterate over the dataframe/RDD using the foreachRDD() loop and manually update/delete the table using JDBC api.

here is link to a similar question : Spark Dataframes UPSERT to Postgres Table



来源:https://stackoverflow.com/questions/35151528/spark-dataframe-execute-update-statement

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!