I\'m using BeautifulSoup (version \'4.3.2\' with Python 3.4) to convert html documents to text. The problem I\'m having is that sometimes web pages have newline characters
I would take a look at python-markdownify. It turns html into pretty readable text in markdown format.
It is available at pypi : https://pypi.python.org/pypi/markdownify/0.4.0
and github : https://github.com/matthewwithanm/python-markdownify