Convert DOCX to HTML incliding IMAGES

送分小仙女□ 提交于 2019-12-24 13:13:34

问题


I am using DOCX4J to convert the DOCX to HTML .I have successfully done the conversion and got the html format.I will be using the html format to embed it as EMAIL body to send an email.But I have some issues which are listed below....

  1. Unable to display images in email body
  2. Losing the spaces and bullets

Please find the code which I have written,

WordprocessingMLPackage wordMLPackage;
wordMLPackage = Docx4J.load(new java.io.File(resourcePath2));
HTMLSettings htmlSettings = Docx4J.createHTMLSettings();
htmlSettings.setImageDirPath(imageFolder + resourcePath2 + "_files"); 
htmlSettings.setImageTargetUri(imageFolder +resourcePath2.substring(resourcePath2.lastIndexOf("/")+1) + "_files");
htmlSettings.setWmlPackage(wordMLPackage);

OutputStream os; 
os = new ByteArrayOutputStream();
Docx4jProperties.setProperty("docx4j.Convert.Out.HTML.OutputMethodXML", true);
Docx4J.toHTML(htmlSettings, os, Docx4J.FLAG_SAVE_FLAT_XML);
DOCX = ((ByteArrayOutputStream)os).toString();

回答1:


You may add like this in your code

package tcg.doc.web.managedBeans;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

import org.apache.poi.xwpf.converter.core.FileImageExtractor;
import org.apache.poi.xwpf.converter.core.FileURIResolver;
import org.apache.poi.xwpf.converter.xhtml.XHTMLConverter;
import org.apache.poi.xwpf.converter.xhtml.XHTMLOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.context.annotation.Scope;
import org.springframework.stereotype.Component;

@Component
@Scope("session")
@Qualifier("ConvertWord")


public class ConvertWord {
    private static final String docName = "TestDocx.docx";
    private static final String outputlFolderPath = "d:/";


    String htmlNamePath = "docHtml.html";
    String zipName="_tmp.zip";
    File docFile = new File(outputlFolderPath+docName);
    File zipFile = new File(zipName);




      public void ConvertWordToHtml() {

          try {

                // 1) Load DOCX into XWPFDocument
                InputStream doc = new FileInputStream(new File(outputlFolderPath+docName));
                System.out.println("InputStream"+doc);
                XWPFDocument document = new XWPFDocument(doc);

                // 2) Prepare XHTML options (here we set the IURIResolver to load images from a "word/media" folder)
                XHTMLOptions options = XHTMLOptions.create(); //.URIResolver(new FileURIResolver(new File("word/media")));;

                // Extract image
                String root = "target";
                File imageFolder = new File( root + "/images/" + doc );
                options.setExtractor( new FileImageExtractor( imageFolder ) );
                // URI resolver
                options.URIResolver( new FileURIResolver( imageFolder ) );


                OutputStream out = new FileOutputStream(new File(htmlPath()));
                XHTMLConverter.getInstance().convert(document, out, options);


                System.out.println("OutputStream "+out.toString());
            } catch (FileNotFoundException ex) {

            } catch (IOException ex) {

            } 
         }

      public static void main(String[] args) {
         ConvertWord cwoWord=new ConvertWord();
         cwoWord.ConvertWordToHtml();
         System.out.println();
    }



      public String htmlPath(){
        // d:/docHtml.html
          return outputlFolderPath+htmlNamePath;
      }

      public String zipPath(){
          // d:/_tmp.zip
          return outputlFolderPath+zipName;
      }

}

For maven Dependency on pom.xml

<dependency>
  <groupId>fr.opensagres.xdocreport</groupId>
  <artifactId>org.apache.poi.xwpf.converter.xhtml</artifactId>
  <version>1.0.4</version>
</dependency>

or download it from Here




回答2:


For images to work in an email body, I guess you need to use either a data URI or publish them to a web-reachable location.

In either case, you'll need to write an implementation of:

public interface ConversionImageHandler {

/**
 * @param picture 
 * @param relationship of the image 
 * @param part of the image, if it is an internal image, otherwise null
 * @return uri for the image we've saved, or null
 * @throws Docx4JException this exception will be logged, but not propagated
 */
public String handleImage(AbstractWordXmlPicture picture, Relationship relationship, BinaryPart part) throws Docx4JException;
}

and configure docx4j to use it with htmlSettings.setImageHandler.

You can look at some of the existing implementations in the docx4j source code, and take advantage of the helper methods in AbstractConversionImageHandler (eg createEncodedImage if you want data URIs).



来源:https://stackoverflow.com/questions/23005392/convert-docx-to-html-incliding-images

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!