How to strip HTML attributes except “src” and “alt” in JAVA
How do I strip all attributes from HTML tags in a string, except "alt" and "src" using Java? And further.. how do I get the content from all "src" attributes in the string? :) You can: Implement a SAX parser ; Built a document with a DOM parser , walk it and prune it and then convert back to HTML; or Use an identity transform in XSLT (assuming your HTML is in XHTML format or can be converted to that with, say, JTidy ) with some additional cases to remove attributes you don't want. Whatever you do, don't try and do it with regular expressions. OK, solved this somehow. Used the HTMLCleaner