Spark DataFrame write to JDBC - Can't get JDBC type for array<array<int>>

此生再无相见时 提交于 2019-12-12 04:37:28

问题


I'm trying to save a dataframe via JDBC (to postgres). One of the fields is of type Array[Array[Int]]. Without any casting, it fails with

Exception in thread "main" java.lang.IllegalArgumentException: Can't 
get JDBC type for array<array<int>>
    at ... (JdbcUtils.scala:148)

I added explicit casting to the array datatype to guide the transformation:

  val df = readings
    .map { case ((a, b), (_, d, e, arrayArrayInt)) => (a, b, d, e, arrayArrayInt) }
    .toDF("A", "B", "D", "E", "arrays")
  edgesDF
    .withColumn("arrays_", edgesDF.col("arrays").cast(ArrayType(ArrayType(IntegerType))))
    .drop("arrays")
    .withColumnRenamed("arrays_", "arrays")
    .write
    .mode(SaveMode.ErrorIfExists)
    .jdbc(url = dbURLWithSchema, table = "mytable", connectionProperties = dbProps)

But it still fails with the same exception.

How can I get this data to persist to DB?


回答1:


You can store array<array<int>> in database, it doesn't supports datatype as array

One option is to make a single string with delimiter by using a simple udf as below

import org.apache.spark.sql.functions._

val arrToString = udf((value: Seq[Seq[Int]]) => {
  value.map(x=> x.map(_.toString).mkString(",")).mkString("::")
})

// this udf creates  array<array<int>> to string as 1,2,3::3,4,5::6,7

df.withColumn("eventTime", arrToString($"eventtime"))

Hipe this helps!



来源:https://stackoverflow.com/questions/45563340/spark-dataframe-write-to-jdbc-cant-get-jdbc-type-for-arrayarrayint

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!