How to split a .doc into several .doc using JAVA POI?

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-12 02:02:08

问题


I am using POI to read .doc files, and I want to select some of the contents to form new .doc files. Specifically speaking, is it possible to write the content of a “paragraph” in the “range” to a new file? Thank you.

HWPFDocument doc = new HWPFDocument(fs);
Range range = doc.getRange();
for (int i = 0; i < range.numParagraphs(); i++) {
    //here I wish to write the content in a Paragraph
    //into a new .doc file "doc1""doc2"
    //instead of doc.write(pathName) that only write one .doc file.
}

回答1:


So here is the code that works with the current task. Here the criteria of selecting paragraphs is quite simple: paragraphs 11..20 go to the file "us.docx", and 21..30 - to "japan.docx".

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;

import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.usermodel.Paragraph;
import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;


public class SplitDocs {

    public static void main(String[] args) {

        FileInputStream in = null;
        HWPFDocument doc = null;

        XWPFDocument us = null;
        XWPFDocument japan = null;
        FileOutputStream outUs = null;
        FileOutputStream outJapan = null;

        try {
            in = new FileInputStream("wto.doc");
            doc = new HWPFDocument(in);

            us = new XWPFDocument();
            japan = new XWPFDocument();

            Range range = doc.getRange();

            for (int parIndex = 0; parIndex < range.numParagraphs(); parIndex++) {  
                Paragraph paragraph = range.getParagraph(parIndex);

                String text = paragraph.text();
                System.out.println("***Paragraph" + parIndex + ": " + text);

                if ( (parIndex >= 11) && (parIndex <= 20) ) {
                    createParagraphInAnotherDocument(us, text);
                } else if ( (parIndex >= 21) && (parIndex <= 30) ) {
                    createParagraphInAnotherDocument(japan, text);
                }
            }

            outUs = new FileOutputStream("us.docx");
            outJapan = new FileOutputStream("japan.docx");
            us.write(outUs);
            japan.write(outJapan);

            in.close();
            outUs.close();
            outJapan.close();

        } catch (IOException e) {
            e.printStackTrace();
        }

    }

    private static void createParagraphInAnotherDocument(XWPFDocument document, String text)  {         XWPFParagraph newPar = document.createParagraph();
        newPar.createRun().setText(text, 0);
    }

}

I used .docx as the output as it is waaaaay easier to add new paragraphs to a .docx than to a .doc file. The method insertAfter(ParagraphProperties props, int styleIndex) for inserting a new Paragraph to a given range is now deprecated (i use POI version 3.10), and i couldn't find an easy and logical way to create a new Paragraph object in the empty .doc file. Whereas it's a pleasure to use straightforward and clean XWPFParagraph newPar = document.createParagraph();.

However, this code uses .doc as an input, as required in your task. Hope this will help :)

P.S. Here we use a simple choosing criteria, using paragraph indices. If you need something like font criteria, as you said, you will probably post another questions, or maybe you'll find the solution yourself. Anyway, with docx things get easier.




回答2:


This is the same situation I have had, please check Apache POI - Split Word document (docx) to pages for a solution. One word of caution, while this solution is better than the one contributed above in sense that it generates formatted pages, it falls short in handling tables and images.



来源:https://stackoverflow.com/questions/25092384/how-to-split-a-doc-into-several-doc-using-java-poi

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!