Pyspark and PCA: How can I extract the eigenvectors of this PCA? How can I calculate how much variance they are explaining?

前端 未结 4 1078
忘掉有多难
忘掉有多难 2020-12-04 17:04

I am reducing the dimensionality of a Spark DataFrame with PCA model with pyspark (using the spark ml library) as follows

4条回答
  •  感动是毒
    2020-12-04 17:28

    In spark 2.2+ you can now easily get the explained variance as:

    from pyspark.ml.feature import VectorAssembler
    assembler = VectorAssembler(inputCols=, outputCol="features")
    df = assembler.transform().select("features")
    from pyspark.ml.feature import PCA
    pca = PCA(k=10, inputCol="features", outputCol="pcaFeatures")
    model = pca.fit(df)
    sum(model.explainedVariance)
    

提交回复
热议问题