How to convert a column that has been read as a string into a column of arrays? i.e. convert from below schema
scala> test.printSchema
root
|-- a: long (
Using a UDF would give you exact required schema. Like this:
val toArray = udf((b: String) => b.split(",").map(_.toLong))
val test1 = test.withColumn("b", toArray(col("b")))
It would give you schema as follows:
scala> test1.printSchema
root
|-- a: long (nullable = true)
|-- b: array (nullable = true)
| |-- element: long (containsNull = true)
+---+-----+
| a| b |
+---+-----+
| 1|[2,3]|
+---+-----+
| 2|[4,5]|
+---+-----+
As far as applying schema on file read itself is concerned, I think that is a tough task. So, for now you can apply transformation after creating DataFrameReader of test.
I hope this helps!