Delete functionality with spark sql dataframe

泪湿孤枕 提交于 2021-02-07 08:47:36

问题


I have a requirement to do a load/delete specific records from postgres db for my spark application. For loading , I am using spark dataframe in the below format

sqlContext.read.format("jdbc").options(Map("url" -> "postgres url", 
      "user" -> "user" ,
      "password" -> "xxxxxx" , 
      "table" -> "(select * from employee where emp_id > 1000) as filtered_emp")).load()

To delete the data, I am writing direct sql instead of using dataframes

delete from employee where emp_id > 1000

The question is , is there a spark way of deleting records in database something similar to below? Or the only way is to use direct sql?

sqlContext.read.format("jdbc").options(Map("url" -> "postgres url", 
      "user" -> "user" ,
      "password" -> "xxxxxx" , 
      "table" -> "(delete from employee where emp_id > 1000) as filtered_emp")).load()

回答1:


If you want to modify(delete records) the actual source of data i.e. tables in postgres then Spark wouldn't be a great way. You can use jdbc client directly for achieving the same.

If you want to do this anyway (in distrubuted manner based on some clues that you are computing as part of dataframes); you can have the same jdbc client code written in correspondence with dataframe that have logic/trigger info for deleting records and that can we executed on multiple workers parallely.



来源:https://stackoverflow.com/questions/39576874/delete-functionality-with-spark-sql-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!