Issue with VectorUDT when using Spark ML
问题 I am writing an UDAF to be applied to a Spark data frame column of type Vector (spark.ml.linalg.Vector). I rely on spark.ml.linalg package so that I do not have to go back and forth between dataframe and RDD. Inside the UDAF, I have to specify a data type for the input, buffer, and output schemas: def inputSchema = new StructType().add("features", new VectorUDT()) def bufferSchema: StructType = StructType(StructField("list_of_similarities", ArrayType(new VectorUDT(), true), true) :: Nil)