How to handle special characters when converting from HTML to DocX

后端 未结 1 1071
死守一世寂寞
死守一世寂寞 2021-01-15 19:00

I have a application that converts html files to DocX using DocX4J. I´m having problems with special characters like ç,á,é,í,ã,etc. My text font in the html files is Arial b

1条回答
  •  不要未来只要你来
    2021-01-15 19:53

    Following the tip given by JasonPlutext, I found an example of how to map a font to the XHTMLImporter at the DocX4J forum (http://www.docx4java.org/forums/docx-java-f6/docx-to-html-and-back-to-docx-t1913.html).

    Now my code is working! See the final version below.


    public WordprocessingMLPackage export(String xhtml) {
    
    WordprocessingMLPackage wordMLPackage = null;
    try {
        RFonts arialRFonts = Context.getWmlObjectFactory().createRFonts();
        arialRFonts.setAscii("Arial");
        arialRFonts.setHAnsi("Arial");
        XHTMLImporterImpl.addFontMapping("Arial", arialRFonts);
    
        wordMLPackage = WordprocessingMLPackage.createPackage();
        XHTMLImporter importer = new XHTMLImporterImpl(wordMLPackage);
        List content = importer.convert(xhtml,null);
        wordMLPackage.getMainDocumentPart().getContent().addAll(content);
    }
    catch (Docx4JException e) {
        // ...
    }
    return wordMLPackage;
    }
    
        

    0 讨论(0)
    提交回复
    热议问题