I need a JDBC sink for my spark structured streaming data frame. At the moment, as far as I know DataFrame’s API lacks writeStream
to JDBC implementation (neith
Just define JDBCSink in a separated file rather than defining it as an inner class which may capture the outer reference.
In case somebody encounters this in an interactive workbook, this solution also works:
Instead of saving the JDBCSink
class to a seperate file, you can also just declare it as a separate package ("Packaged cell") within the same workbook and import that package in the cell where you are using it. Well described here https://docs.databricks.com/user-guide/notebooks/package-cells.html
Looks like the offender here is the use of import spark.implicits._
inside the JDBCSink
class:
JDBCSink
must be serializableJDBCSink
reference the non-serializable SparkSession
which is then serialized along with it (techincally, SparkSession extends Serializable
, but it's not meant to be deserialized on the worker nodes)The good news: you're not using this import, so if you just remove it, this should work.