问题
I tried converting .doc to HTML by using WordToHtmlConverter and it worked perfectly.
But when i tried to convert .docx to HTML, i got stuck with it.
What i tried:
I used the below code to convert .docx to HTML:
The code which i tried from : How to use Tika's XWPFWordExtractorDecorator class?
InputStream input = TikaInputStream.get(new File("C:\\Users\\Downloads\\filename.docx"));
Parser parser = new AutoDetectParser();
StringWriter sw = new StringWriter();
SAXTransformerFactory factory = (SAXTransformerFactory)
SAXTransformerFactory.newInstance();
TransformerHandler handler = factory.newTransformerHandler();
handler.getTransformer().setOutputProperty(OutputKeys.METHOD, "html");
handler.getTransformer().setOutputProperty(OutputKeys.INDENT, "yes");
handler.setResult(new StreamResult(sw));
try {
Metadata metadata = new Metadata();
parser.parse(input, handler, metadata, new ParseContext());
String xml = sw.toString();
System.out.print("tika : "+xml);
} finally {
input.close();
}
The output what i got is,
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title/>
</head>
<body/>
</html>
- Please explain where i gone wrong?
- Is there any better way to convert .docx to html string
Appreciate your help, Thanks
回答1:
This code worked for me to convert .docx to html:
You can also look at the link : Link to code
//convert .docx to HTML string
InputStream in= new FileInputStream(new File(path));
XWPFDocument document = new XWPFDocument(in);
XHTMLOptions options = XHTMLOptions.create().URIResolver(new FileURIResolver(new File("word/media")));
OutputStream out = new ByteArrayOutputStream();
XHTMLConverter.getInstance().convert(document, out, options);
String html=out.toString();
System.out.println(html);
回答2:
You may want to make use of Mammoth docx to HTML library.Its a library for displaying doc, docx documents by converting them to html on the browser side as well as can be handled on the backend.
- Library Supports - JavaScript, both the browser and node.js. Available on npm. Python. Available on PyPI. WordPress. Java/JVM. Available on Maven Central. .NET. Available on NuGet.
- Link: https://mike.zwobble.org/projects/mammoth/ (Demo and Article)
- Github: https://github.com/mwilliamson/mammoth.js
来源:https://stackoverflow.com/questions/24652953/convert-docx-to-html-using-java