Spark Dataframes UPSERT to Postgres Table

前端 未结 4 2093
北荒
北荒 2020-11-30 03:50

I am using Apache Spark DataFrames to join two data sources and get the result as another DataFrame. I want to write the result to another Postgres table. I see this option

4条回答
  •  [愿得一人]
    2020-11-30 04:15

    KrisP has the right of it. The best way to do an upsert is not through a prepared statement. It's important to note that this method will insert one at a time with as many partitions as the number of workers you have. If you want to do this in batch you can as well

    import java.sql._
    dataframe.coalesce("NUMBER OF WORKERS").mapPartitions((d) => Iterator(d)).foreach { batch =>
      val dbc: Connection = DriverManager.getConnection("JDBCURL")
      val st: PreparedStatement = dbc.prepareStatement("YOUR PREPARED STATEMENT")
    
      batch.grouped("# Of Rows you want per batch").foreach { session =>
        session.foreach { x =>
          st.setDouble(1, x.getDouble(1)) 
          st.addBatch()
        }
        st.executeBatch()
      }
      dbc.close()
    }
    

    This will execute batches for each worker and close the DB connection. It gives you control over how many workers, how many batches and allows you to work within those confines.

提交回复
热议问题