PySpark ML: Get KMeans cluster statistics
问题 I have built a KMeansModel. My results are stored in a PySpark DataFrame called transformed . (a) How do I interpret the contents of transformed ? (b) How do I create one or more Pandas DataFrame from transformed that would show summary statistics for each of the 13 features for each of the 14 clusters? from pyspark.ml.clustering import KMeans # Trains a k-means model. kmeans = KMeans().setK(14).setSeed(1) model = kmeans.fit(X_spark_scaled) # Fits a model to the input dataset with optional