I am trying to read one file in java, following is the code :
public void readFile(String fileName){
try {
BufferedReader reader= new BufferedReader(new FileReader(fileName));
String line=null;
while((line=reader.readLine()) != null ){
System.out.println(line);
}
}catch (Exception ex){}
}
It is working fine in case of txt file. However in case of docx file, it is printing weird characters. How can i read .docx file in Java.
import java.io.File;
import java.io.FileInputStream;
import java.util.List;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
public void readDocxFile() {
try {
File file = new File("C:/NetBeans Output/documentx.docx");
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
XWPFDocument document = new XWPFDocument(fis);
List<XWPFParagraph> paragraphs = document.getParagraphs();
for (XWPFParagraph para : paragraphs) {
System.out.println(para.getText());
}
fis.close();
} catch (Exception e) {
e.printStackTrace();
}
}
Internally .docx files are organized as zipped XML-files, whereas .doc is a binary file format. So you can not read either one directly. Have a look at docx4j or Apache POI.
If you are trying to create or manipulate a .docx file, try docx4j Here is the source
or go for apachePOI
You may want to check Apache POI.
You cannot read the docx file or doc file directly. You need to have an API to read word files. Use Apache POI http://poi.apache.org/. If you get any doubts, please refer this thread on stackoverflow.com How read Doc or Docx file in java?
you must have following 6 jar:
- xmlbeans-2.3.0.jar
- dom4j-1.6.1.jar
- poi-ooxml-3.8-20120326.jar
- poi-ooxml-schemas-3.8-20120326.jar
- poi-scratchpad-3.2-FINAL.jar
- poi-3.5-FINAL.jar
Code:
import java.io.File;
import java.io.FileInputStream;
import java.util.Iterator;
import java.util.List;
import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
public class test {
public static void readDocxFile(String fileName) {
try {
File file = new File(fileName);
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
XWPFDocument document = new XWPFDocument(fis);
for(int i=0;i<paragraphs.size();i++){
System.out.println(paragraphs.get(i).getParagraphText());
}
fis.close();
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
readDocxFile("C:\\Users\\sp0c43734\\Desktop\\SwatiPisal.docx");
}
}
来源:https://stackoverflow.com/questions/16682942/reading-docx-file-in-java