Need help with creating PDF from HTML using itextsharp

落爺英雄遲暮 提交于 2019-11-28 09:29:00

For later versions of iTextSharp:

Using iTextSharp you can use the iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList() method to create a PDF from HTML.

ParseToList() takes a TextReader (an abstract class) for its HTML source, which means you can use a StringReader or StreamReader (both of which use TextReader as a base type). I used a StringReader and was able to generate PDFs from simple mark up. I tried to use the HTML returned from a webpage and got errors on all but the simplist pages. Even the simplist webpage I retrieved (http://black.ea.com/) was rendering the content of the page's 'head' tag onto the PDF, so I think the HTMLWorker.ParseToList() method is picky about the formatting of the HTML it parses.

Anyway, if you want to try here's the test code I used:

// Download content from a very, very simple "Hello World" web page.
string download = new WebClient().DownloadString("http://black.ea.com/");

Document document = new Document(PageSize.A4, 80, 50, 30, 65);
try {
    using (FileStream fs = new FileStream("TestOutput.pdf", FileMode.Create)) {
        PdfWriter.GetInstance(document, fs);
        using (StringReader stringReader = new StringReader(download)) {
            ArrayList parsedList = HTMLWorker.ParseToList(stringReader, null);
            document.Open();
            foreach (object item in parsedList) {
                document.Add((IElement)item);
            }
            document.Close();
        }
    }

} catch (Exception exc) {
    Console.Error.WriteLine(exc.Message);
}

I couldn't find any documentation on which HTML constructs HTMLWorker.ParseToList() supports; if you do please post it here. I'm sure a lot of people would be interested.

For older versions of iTextSharp: You can use the iTextSharp.text.html.HtmlParser.Parse method to create a PDF based on html.

Here's a snippet demonstrating this:

Document document = new Document(PageSize.A4, 80, 50, 30, 65); 
try  {
   using (FileStream fs = new FileStream("TestOutput.pdf", FileMode.Create)) {
      PdfWriter.GetInstance(document, fs);
      HtmlParser.Parse(document, "YourHtmlDocument.html");
   }
} catch(Exception exc)  { 
   Console.Error.WriteLine(exc.Message); 
} 

The one (major for me) problem is the HTML must be strictly XHTML compliant.

Good luck!

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!