Flying Saucer not recognizing html entities

≡放荡痞女 提交于 2019-12-13 03:23:28

问题


I'm trying to use an html file as a template for a pdf, but Flying Saucer isn't recognizing the HTML5 entities (&trade, &nbsp etc). If I replace them with their hex values, then the program runs fine.

My code is as follows:

  public static InputStream create(String content) throws PDFUtilException {

try (ByteArrayOutputStream baos = new ByteArrayOutputStream();) {
  ITextRenderer iTextRenderer = new ITextRenderer();
  iTextRenderer.getSharedContext()
               .setReplacedElementFactory(new MediaReplacedElementFactory(iTextRenderer.getSharedContext()
                                                                                       .getReplacedElementFactory()));

  iTextRenderer.setDocumentFromString(closeOutTags(content), null);
  iTextRenderer.layout();
  iTextRenderer.createPDF(baos);
  return new ByteArrayInputStream(baos.toByteArray());
} catch (IOException | DocumentException e) {
  throw new PDFUtilException("Unable to create PDF", e);
}

}

Thanks,

Oliver


回答1:


Michael is correct in saying that Flying Saucer needs well-formed XML, but if your only problem are predefined HTML entities (which aren't part of XML), then you can declare them yourself at the begin of your document like so:

<!DOCTYPE html [
  <!ENTITY % htmlentities SYSTEM "https://www.w3.org/2003/entities/2007/htmlmathml-f.ent">
  %htmlentities;
]>
<!-- your XHTML text following here -->

This pulls-in the entity declarations from their official URL into the htmlentities parameter entity, then references (eg. "executes") the pulled-in declarations. If you only need trade and nbsp, or if Flying Saucer won't allow you to access URLs from the net, you can declare them manually as well:

<!DOCTYPE html [
  <!ENTITY trade "&#x02122;">
  <!ENTITY nbsp "&#x000A0;">
]>
<!-- your XHTML text following here -->

Now if you actually have a proper HTML (not XHTML) file, then you won't be able to use an XML processor directly with it, because HTML uses markup features not supported by XML (for example, empty elements such as the img element, omitted tags, and attribute shortforms). But you can use an SGML processor to first convert HTML to XHTML (XML), and then use Flying Saucer on the result XML file (SGML is the superset of both HTML and XML, and the original markup language on which HTML and XML are based). The process involves using an HTML DTD grammar such as the original W3C HTML4 DTD (from 1999) or my HTML5 DTD on sgmljs.net plus an SGML processor. Before going into details, though, first check if merely adding entity declarations as already described solves your problem.




回答2:


I've never heard of Flying Saucer until today but the first sentence of the documentation says "Flying Saucer is a pure-Java library for rendering arbitrary well-formed XML (or XHTML)" which suggests rather strongly that it expects well-formed XML input, rather than HTML.



来源:https://stackoverflow.com/questions/56171601/flying-saucer-not-recognizing-html-entities

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!