I\'d like to extract the text from an HTML file using Python. I want essentially the same output I would get if I copied the text from a browser and pasted it into notepad.
The LibreOffice writer comment has merit since the application can employ python macros. It seems to offer multiple benefits both for answering this question and furthering the macro base of LibreOffice. If this resolution is a one-off implementation, rather than to be used as part of a greater production program, opening the HTML in writer and saving the page as text would seem to resolve the issues discussed here.