I am using TfidfVectorizer to convert a collection of raw documents to a matrix of TF-IDF features, which I then plan to input into a k-means algorithm (which I will impleme
This should be as simple as:
dense = X.toarray()
TfIdfVectorizer.fit_transform()
is returning a SciPy csr_matrix() (Compressed Sparse Row Matrix), which has a toarray()
method just for this purpose. There are several formats of sparse matrices in SciPy, but they all have a .toarray() method.
Note that for a large matrix, this will use a tremendous amount of memory compared to a sparse matrix, so generally it's a good approach to leave it sparse for as long as possible.