NLTK corpus reader paragraph
问题 I tried to copy paste content from word document (.docx) to a .txt file and made it read by a nltk corpus reader to find number of paragraph. It returns almost 30 paragraph as one paragraph. I manually entered a line break in .txt file and it returned 30 paragraphs. import nltk corpusReader = nltk.corpus.reader.plaintext.PlaintextCorpusReader(".", "d.txt") print "Paragraphs =", len(corpusReader.paras()) Is it possible for PlaintextCorpus reader to read .docx? While copy pasting from .docx to