Java HTML->PDF Solutions?

江枫思渺然 提交于 2019-12-06 02:29:04

Flying Saucer converts XHTML to PDF. It is great. It is not fast. It fails if there is a slight error in your XHTML syntax. (such as <br> when it should be <br/>)

This is the link that got me started. It seems to use iText, but once you have the thing working, just change the HTML and it updates.

http://today.java.net/pub/a/today/2007/06/26/generating-pdfs-with-flying-saucer-and-itext.html

There may be a better way, this is how I did it.

If your source HTML is styled with CSS and not necessarily well-formed, try PD4ML library (free for non-profit use).

i can recommend jodconverter it uses openoffice in headless mode

1 install openoffice (for linux "zypper install libreoffice")

2 WIN put it on the path-variable so "soffice" can be run from everywhere, for me it was "C:\Program Files (x86)\LibreOffice 4\program"

3 LINUX make sure the user which runs the java process owns his home directory, because openoffice needs to store configs there, for me tomcat ran the process, but its home dir was owned by root

4 add jodconverter-lib to your java project

<dependency>
    <groupId>com.artofsolving</groupId>
    <artifactId>jodconverter</artifactId>
    <version>2.2.1</version>
</dependency>

convert

// ensure open office is running
String[] commands = new String[] {"soffice","--headless","--accept=socket,host=localhost,port=8100;urp;"};
Runtime.getRuntime().exec(commands);

// convert
String html = "<div>hey there</div>";
ByteArrayOutputStream pdfOutputStream = new ByteArrayOutputStream();
DefaultDocumentFormatRegistry defaultDocumentFormatRegistry = new DefaultDocumentFormatRegistry();
OpenOfficeConnection connection = new SocketOpenOfficeConnection(8100);
connection.connect();
DocumentConverter converter = new OpenOfficeDocumentConverter(connection);
converter.convert(IOUtils.toInputStream(html, Charset.forName("UTF-8")), defaultDocumentFormatRegistry.getFormatByFileExtension("html"), pdfOutputStream, defaultDocumentFormatRegistry.getFormatByFileExtension("pdf"));
connection.disconnect();
byte[] pdfBytes = pdfOutputStream.toByteArray();

Using phantomjs, you can convert HTML to PDF very easily:

import org.openqa.selenium.phantomjs.PhantomJSDriver;
import org.openqa.selenium.phantomjs.PhantomJSDriverService;
import org.openqa.selenium.remote.DesiredCapabilities;

public class Screenshot {

  public static final String SCRIPT = "var page = require('webpage').create();\n" +
          "page.open('@@URL@@', function() {\n" +
          "  page.render('@@FILE@@');\n" +
          "});\n";

  public static void main(String[] args) {

    final String url = args[0];
    final String file = args[1];
    final String script = SCRIPT.replace("@@URL@@", url).replace("@@FILE@@", file);

    final DesiredCapabilities capabilities = new DesiredCapabilities();
    capabilities.setCapability(PhantomJSDriverService.PHANTOMJS_EXECUTABLE_PATH_PROPERTY,
                               "/path/to/phantomjs/bin/phantomjs");
    try {
      PhantomJSDriver phantomJSDriver = new PhantomJSDriver(capabilities);
      phantomJSDriver.executePhantomJS(script);
    } finally {
      phantomJSDriver.close();
    }
  }

}

If the filename ends with .pdf then the webpage will be saved as PDF. Phantomjs also supports PNG, JPG and GIF output.

This is a very simple example, more generally the screenshot process is very customizable (set viewport size, enable/disable javascript, etc). Look at PhantomJS's page on screen capturing for more info.

JavaFx WebKit browser could be used for html to pdf conversion. For windows install pdf24 printer driver and for Linux use cups-pdf. After installation use print method of WebEngine.

If using external library is ok for you, you could easily use ui4j to print Web page to pdf.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!