How to get the wikipedia corpus text with punctuation by using gensim wikicorpus?
问题 I'm trying to get the text with its punctuation as it is important to consider the latter in my doc2vec model. However, the wikicorpus retrieve only the text. After searching the web I found these pages: Page from gensim github issues section. It was a question by someone where the answer was to subclass WikiCorpus (answered by Piskvorky). Luckily, in the same page, there was a code representing the suggested 'subclass' solution. The code was provided by Rhazegh. (link) Page from