iTextSharp.HTMLWorker bad convert html encrypt UTF-8

我怕爱的太早我们不能终老 提交于 2019-12-13 15:51:35

问题


I'd like to apologize, my English is not very good. I hope that you help me. I have this string from xslt trans.:

"<?xml version=\"1.0\" encoding=\"utf-8\"?>\r\n<html>\r\n  <head />\r\n  <body style=\"font-family: Verdana, sans-serif;\">\r\n  ĚŠČŘŽŘÝŽÝÁÁÍÉŮ ěščřžýáíéůú\r\n       </body>\r\n</html>"

and when I trying convert this html string to pdf:

private static byte[] CreatePdfUsingXslt(string htmlText)
    {
        var msOutput = new MemoryStream();
        var reader = new StringReader(htmlText);

        var document = new Document(PageSize.A4, 30, 30, 30, 30);
        var pdfWriter = PdfWriter.GetInstance(document, msOutput);
        var worker = new HTMLWorker(document);

        document.Open();
        worker.StartDocument();
        worker.Parse(reader);
        worker.EndDocument();
        worker.Close();
        document.Close();

        return msOutput.GetBuffer();
    }

protected void SavePDF(byte[] storeData, string fileName)
    {
        Response.Clear();
        Response.ClearContent();
        Response.ClearHeaders();

        Response.AddHeader("content-disposition", "attachment;filename=" + fileName);
        Response.ContentType = "application/pdf";
        Response.Charset = Encoding.UTF8.ToString();
        Response.ContentEncoding = Encoding.UTF8;
        Response.Buffer = true;

        Response.BinaryWrite(storeData);
        Response.Flush();
        Response.End();
    }

I'm getting something like this: ŠŽÝŽÝÁÁÍÉ šžýáíéú and it's wrong... My result doesn't have characters like Ě,Č,Ř,Ů,... These are czech language characters.


I try to use XMLParser like this:

private static byte[] CreatePdfUsingXslt(string htmlText, string serverPath)
    {
        var msInput = new MemoryStream(Encoding.UTF8.GetBytes(htmlText));
        var msOutput = new MemoryStream();

        msInput.Position = 0;
        msOutput.Position = 0;

        var doc = new Document(PageSize.A4, 30, 30, 30, 30);
        PdfWriter pdfWriter = PdfWriter.GetInstance(doc, msOutput);

        var htmlPipelineContext = new HtmlPipelineContext();
        var cssResolver = XMLWorkerHelper.GetInstance().GetDefaultCssResolver(true);

        var cssPath = String.Format("{0}XsltTemplates\\print.css", serverPath);

        cssResolver.AddCssFile(cssPath,true);

        var pipeline = new CssResolverPipeline(cssResolver,
                                               new HtmlPipeline(htmlPipelineContext, new PdfWriterPipeline(doc, pdfWriter)));
        var xmlWorker = new XMLWorker(pipeline, true);
        var xmlParser = new XMLParser(true, xmlWorker);

        xmlParser.Parse(new StreamReader(msInput, Encoding.UTF8));
        xmlParser.Flush();

        doc.Close();
        return msOutput.ToArray();
    }

but function xmlParser.Parse(new StreamReader(msInput, Encoding.UTF8)); throw NullReferenceException.

This is more important for me. Do you have anyone any idea, how I fix it?

来源:https://stackoverflow.com/questions/14896966/itextsharp-htmlworker-bad-convert-html-encrypt-utf-8

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!