Sparse Principal Component Analysis using sklearn

问题

I'm trying to replicate an application from this paper, where the authors download the 20 newsgroups data and use SPCA to extract the principal components that in some sense best describe the text corpus [see section 4.1]. This is for a high dimensions class project where we were asked to pick a topic and replicate/apply it.

The output should be K principal components, which each have a short list of words that all intuitively correspond to a certain theme (for example, the paper finds the first PC is all about politics and religion).

From my research it seems like the best way to reproduce the application from this paper is using this algorithm: sklearn.decomposition.MiniBatchSparsePCA.

I have found only one example of how this alogrithm works, here.

So my question is this: Is it, in principal, possible to follow the steps in the above linked example, using text data to reproduce the application from section 4.1 in the paper linked in the first paragraph?

If it is, I would then be able to ask more concrete question regarding the code.

来源：https://stackoverflow.com/questions/47906412/sparse-principal-component-analysis-using-sklearn

标签

python

machine-learning

pca

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!