I have a number of PDF documents, which I have read into a corpus with library tm. How can one break the corpus into sentences?
It can
openNLP had some major changes. The bad news is it looks very different than it used to. The good news is that it's more flexible and the functionality you enjoyed before is still there, you just have to find it.
This will give you what you're after:
?Maxent_Sent_Token_Annotator
Just work through the example and you'll see the functionality you're looking for.