Pyspark and PCA: How can I extract the eigenvectors of this PCA? How can I calculate how much variance they are explaining?

前端未结

关注

 4  1078

忘掉有多难 2020-12-04 17:04

I am reducing the dimensionality of a Spark DataFrame with PCA model with pyspark (using the spark ml library) as follows

4条回答

感动是毒 (楼主)

2020-12-04 17:28

In spark 2.2+ you can now easily get the explained variance as:

from pyspark.ml.feature import VectorAssembler
assembler = VectorAssembler(inputCols=, outputCol="features")
df = assembler.transform().select("features")
from pyspark.ml.feature import PCA
pca = PCA(k=10, inputCol="features", outputCol="pcaFeatures")
model = pca.fit(df)
sum(model.explainedVariance)

0 讨论(0)

查看其它4个回答