Understanding Representation of Vector Column in Spark SQL
问题 Before I used VectorAssembler() to consolidate some OneHotEncoded categorical features... My data frame looked like so : | Numerical| HotEncoded1| HotEncoded2 | 14460.0| (44,[5],[1.0])| (3,[0],[1.0])| | 14460.0| (44,[9],[1.0])| (3,[0],[1.0])| | 15181.0| (44,[1],[1.0])| (3,[0],[1.0])| The first column is a numerical column and the other two columns represent the transformed data set for OneHotEncoded categorical features. After applying VectorAssembler(), my output becomes: [(48,[0,1,9],[14460