Is there a recommended way to escape <
, >
, \"
and &
characters when outputting HTML in plain Java code? (Other
While @dfa answer of org.apache.commons.lang.StringEscapeUtils.escapeHtml
is nice and I have used it in the past it should not be used for escaping HTML (or XML) attributes otherwise the whitespace will be normalized (meaning all adjacent whitespace characters become a single space).
I know this because I have had bugs filed against my library (JATL) for attributes where whitespace was not preserved. Thus I have a drop in (copy n' paste) class (of which I stole some from JDOM) that differentiates the escaping of attributes and element content.
While this may not have mattered as much in the past (proper attribute escaping) it is increasingly become of greater interest given the use use of HTML5's data-
attribute usage.