How to make TF-IDF matrix dense?

后端 未结 1 1609
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-18 22:12

I am using TfidfVectorizer to convert a collection of raw documents to a matrix of TF-IDF features, which I then plan to input into a k-means algorithm (which I will impleme

相关标签:
1条回答
  • 2020-12-18 22:46

    This should be as simple as:

    dense = X.toarray()
    

    TfIdfVectorizer.fit_transform() is returning a SciPy csr_matrix() (Compressed Sparse Row Matrix), which has a toarray() method just for this purpose. There are several formats of sparse matrices in SciPy, but they all have a .toarray() method.

    Note that for a large matrix, this will use a tremendous amount of memory compared to a sparse matrix, so generally it's a good approach to leave it sparse for as long as possible.

    0 讨论(0)
提交回复
热议问题