Spark CountVectorizer return udt instead of vector [duplicate]
问题 This question already has an answer here : Understanding Representation of Vector Column in Spark SQL (1 answer) Closed last year . I try to create a vector of token counts for a LDA analysis in Spark 2.3.0. I have followed some tutorial and at each time they use CountVectorizer to easily convert Array of String to Vector. I run this short example on my Databricks notebook : import org.apache.spark.ml.feature.CountVectorizer val testW = Seq( (8, Array("Zara", "Nuha", "Ayan", "markle")), (9,