I have a text file. I need to get a list of sentences.
How can this be implemented? There are a lot of subtleties, such as a dot being used in abbreviations.
The Natural Language Toolkit (nltk.org) has what you need. This group posting indicates this does it:
import nltk.data tokenizer = nltk.data.load('tokenizers/punkt/english.pickle') fp = open("test.txt") data = fp.read() print '\n-----\n'.join(tokenizer.tokenize(data))
(I haven't tried it!)