I have found HTML Tidy (www.html-tidy.org) to do the best job of tidying and cleaning HTML.
The different binaries are here -> http://binaries.html-tidy.org
Also there are wrappers for HTML Tidy in many languages. I use one called TidyHtml5ManagedRepack for C#.
I have specific needs to clean poorly formed HTML and also compare it to the same or similar HTML that gets adjusted via javascript in different browsers. HTML Tidy allows me to clean the HTML to a state where its normal / normalised so I can then compare it to the same HTML that was adjusted by other browsers to have the confidence that it is most likely the same.