INSERT & UPDATE MySql table using PySpark DataFrames and JDBC

故事扮演 提交于 2021-02-08 09:36:06

问题


I'm trying to insert and update some data on MySql using PySpark SQL DataFrames and JDBC connection.

I've succeeded to insert new data using the SaveMode.Append. Is there a way to update the existing data and insert new data in MySql Table from PySpark SQL?

My code to insert is:

myDataFrame.write.mode(SaveMode.Append).jdbc(JDBCurl,mySqlTable,connectionProperties)

If I change to SaveMode.Overwrite it deletes the full table and creates a new one, I'm looking for something like the "ON DUPLICATE KEY UPDATE" available in MySql

Any help on this is highly appreciated.


回答1:


  1. Create a view in Mysql as create view <viewName> as select ...<tableName>
  2. Create trigger in mysql to update after insert using -
CREATE TRIGGER trigger_name
    AFTER INSERT
    ON <viewName> FOR EACH ROW
BEGIN
    -- statements
    -- INSERT ... ON DUPLICATE KEY UPDATE Statement
END$$  

ref - https://www.mysqltutorial.org/mysql-triggers/mysql-after-insert-trigger/

  1. Write data to view <viewName> from spark


来源:https://stackoverflow.com/questions/62695035/insert-update-mysql-table-using-pyspark-dataframes-and-jdbc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!