How to write JDBC Sink for Spark Structured Streaming [SparkException: Task not serializable]?

后端未结

关注

 3  2012

I need a JDBC sink for my spark structured streaming data frame. At the moment, as far as I know DataFrame’s API lacks writeStream to JDBC implementation (neith

相关标签:

3条回答

忘掉有多难

2020-12-13 21:53

Just define JDBCSink in a separated file rather than defining it as an inner class which may capture the outer reference.

0 讨论(0)
发布评论:

提交评论
- 加载中...
没有蜡笔的小新

2020-12-13 22:02

In case somebody encounters this in an interactive workbook, this solution also works:

Instead of saving the JDBCSinkclass to a seperate file, you can also just declare it as a separate package ("Packaged cell") within the same workbook and import that package in the cell where you are using it. Well described here https://docs.databricks.com/user-guide/notebooks/package-cells.html

0 讨论(0)
发布评论:

提交评论
- 加载中...
北恋

2020-12-13 22:10
Looks like the offender here is the use of import spark.implicits._ inside the JDBCSink class:
- JDBCSink must be serializable
- By adding this import, you make your JDBCSink reference the non-serializable SparkSession which is then serialized along with it (techincally, SparkSession extends Serializable, but it's not meant to be deserialized on the worker nodes)
The good news: you're not using this import, so if you just remove it, this should work.
0 讨论(0)
发布评论:

提交评论
- 加载中...