I\'m working on a small Python script to clean up HTML documents. It works by accepting a list of tags to KEEP and then parsing through the HTML code trashing tags that are
]*>(.*?)
Matches the opening and closing pair of a specific HTML tag.
<([A-Z][A-Z0-9]*)\b[^>]*>(.*?)\1>
Will match the opening and closing pair of any HTML tag.
See here.