How to read doc file using Poi?

杀马特。学长 韩版系。学妹 提交于 2019-12-14 03:27:17

问题


I am trying to view word file in my editor pane I tried these lines

import java.awt.Dimension;
import java.awt.GridLayout;
import java.io.File;
import java.io.FileInputStream;
import javax.swing.JEditorPane;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;

public class editorpane extends JEditorPane
{
public editorpane(File file)
{

    try
    {
        FileInputStream fis = new FileInputStream(file.getAbsolutePath());
        HWPFDocument hwpfd = new HWPFDocument(fis);
        WordExtractor we = new WordExtractor(hwpfd);
        String[] array = we.getParagraphText();
        for (int i = 0; i < array.length; i++)
        {
            this.setPage(array[i]);
        }

    } catch (Exception e)
    {
        e.printStackTrace();
    }

but gives me

org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:131)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:104)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:138)
at org.apache.poi.hwpf.HWPFDocumentCore.verifyAndBuildPOIFS(HWPFDocumentCore.java:106)
at org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:174)
at frame1.editorpane.<init>(editorpane.java:24)

in this line

HWPFDocument hwpfd = new HWPFDocument(fis);

how can I solve that ??

beside I am not sure about these lines

for (int i = 0; i < array.length; i++)
        {
            this.setPage(array[i]);
        }

can I get them confirmed ??


回答1:


You are trying to open a .docx file (XWPF) with code for .doc (HWPF) files. You can use XWPFWordExtractor for .docx files.

There is an ExtractorFactory which you can use to let POI decide which of these applies and uses the correct class to open the file, however you can then not iterate by page as only a generic getText() method is available then.

Use it like this

POITextExtractor extractor = ExtractorFactory.createExtractor(file);
extractor.getText();


来源:https://stackoverflow.com/questions/34924676/how-to-read-doc-file-using-poi

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!