How to make a Jsoup whitelist to accept certain attribute content

后端 未结 1 758
广开言路
广开言路 2020-12-17 03:01

I\'m using Jsoup with relaxed whitelist. It seems perfect but I would like to keep the embedded images tags like \"\".

Is

相关标签:
1条回答
  • 2020-12-17 03:52

    You can extend Whitelist and override isSafeAttribute to perform custom checks. As there's no way to extend Whitelist.relaxed() directly, you'll have to copy some code to set up the same list:

    public class RelaxedPlusDataBase64Images extends Whitelist {
        public RelaxedPlusDataBase64Images() {
            //copied from Whitelist.relaxed()
            addTags("a", "b", "blockquote", "br", "caption", "cite", "code", "col",
                    "colgroup", "dd", "div", "dl", "dt", "em", "h1", "h2", "h3", "h4", "h5", "h6",
                    "i", "img", "li", "ol", "p", "pre", "q", "small", "strike", "strong",
                    "sub", "sup", "table", "tbody", "td", "tfoot", "th", "thead", "tr", "u",
                    "ul");
            addAttributes("a", "href", "title");
            addAttributes("blockquote", "cite");
            addAttributes("col", "span", "width");
            addAttributes("colgroup", "span", "width");
            addAttributes("img", "align", "alt", "height", "src", "title", "width");
            addAttributes("ol", "start", "type");
            addAttributes("q", "cite");
            addAttributes("table", "summary", "width");
            addAttributes("td", "abbr", "axis", "colspan", "rowspan", "width");
            addAttributes("th", "abbr", "axis", "colspan", "rowspan", "scope", "width");
            addAttributes("ul", "type");
            addProtocols("a", "href", "ftp", "http", "https", "mailto");
            addProtocols("blockquote", "cite", "http", "https");
            addProtocols("cite", "cite", "http", "https");
            addProtocols("img", "src", "http", "https");
            addProtocols("q", "cite", "http", "https");
        }
    
        @Override
        protected boolean isSafeAttribute(String tagName, Element el, Attribute attr) {
            return ("img".equals(tagName)
                    && "src".equals(attr.getKey())
                    && attr.getValue().startsWith("data:;base64")) ||
                super.isSafeAttribute(tagName, el, attr);
        }
    }
    

    As you haven't provided the code you're using to parse or the HTML you're sanitizing, I haven't tested this.

    0 讨论(0)
提交回复
热议问题