How to write JDBC Sink for Spark Structured Streaming [SparkException: Task not serializable]?

后端 未结 3 2012
再見小時候
再見小時候 2020-12-13 21:27

I need a JDBC sink for my spark structured streaming data frame. At the moment, as far as I know DataFrame’s API lacks writeStream to JDBC implementation (neith

相关标签:
3条回答
  • 2020-12-13 21:53

    Just define JDBCSink in a separated file rather than defining it as an inner class which may capture the outer reference.

    0 讨论(0)
  • 2020-12-13 22:02

    In case somebody encounters this in an interactive workbook, this solution also works:

    Instead of saving the JDBCSinkclass to a seperate file, you can also just declare it as a separate package ("Packaged cell") within the same workbook and import that package in the cell where you are using it. Well described here https://docs.databricks.com/user-guide/notebooks/package-cells.html

    0 讨论(0)
  • 2020-12-13 22:10

    Looks like the offender here is the use of import spark.implicits._ inside the JDBCSink class:

    • JDBCSink must be serializable
    • By adding this import, you make your JDBCSink reference the non-serializable SparkSession which is then serialized along with it (techincally, SparkSession extends Serializable, but it's not meant to be deserialized on the worker nodes)

    The good news: you're not using this import, so if you just remove it, this should work.

    0 讨论(0)
提交回复
热议问题