Spark Scala Cosine Similarity Matrix
问题 New to scala ( pyspark guy) and trying to calculated cosine similarity between rows (items) Followed this to create a sample df as an example: Spark, Scala, DataFrame: create feature vectors import org.apache.spark.ml.feature.VectorAssembler val df = sc.parallelize(Seq( (1, "cat1", 1), (1, "cat2", 3), (1, "cat9", 5), (2, "cat4", 6), (2, "cat9", 2), (2, "cat10", 1), (3, "cat1", 5), (3, "cat7", 16), (3, "cat8", 2))).toDF("userID", "category", "frequency") // Create a sorted array of categories