I have a corpus which has around 8 million news articles, I need to get the TFIDF representation of them as a sparse matrix. I have been able to do that using scikit-learn f
Gensim has an efficient tf-idf model and does not need to have everything in memory at once.
Your corpus simply needs to be an iterable, so it does not need to have the whole corpus in memory at a time.
The make_wiki script runs over Wikipedia in about 50m on a laptop according to the comments.