Spark connection pooling - Is this the right approach

本秂侑毒 提交于 2019-12-08 19:13:30

I am using the forEachWriter for redis similarly, where the pool is being referenced in the process only. Your request would look something like below

def open(partitionId: Long, version: Long): Boolean = {
    true
  }

  def process(record: record) = {
    influxDB = InfluxConnectionPool.getConnectionFromPool
    // this is where I use the connection object and save the data
    MyService.saveData(influxDB, record.topic, record.value)
    InfluxConnectionPool.returnConnectionToPool(influxDB)
  }```
datasetOfString.writeStream.foreach(new ForeachWriter[String] {
      def open(partitionId: Long, version: Long): Boolean = {
        // open connection
      }
      def process(record: String) = {
        // write string to connection
      }
      def close(errorOrNull: Throwable): Unit = {
        // close the connection
      }
    })

From the docs of ForeachWriter,

Each task will get a fresh serialized-deserialized copy of the provided object

So whatever you initialize outside the ForeachWriter will run only at the driver.

You need to initialize the connection pool and open the connection in the open method.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!