问题
I am using PDFBox in Java to attempt to extract text from the pdf file. This is how I load the file:
PDDocument document = PDDocument.load(new File(path1));
As you can see, it opens the file and loads the stuff inside it. This may cause issue when say I tried to load a file which has 10 million words or text which is huge and it throws an OutOfMemoryException:Java heap space
.
I actually tested this and it does throw an error. And the culprit was the line above. Is there a way to open the file but not loading it's content in PDFBox?
I appreciate any suggestion.
回答1:
Use :
PDDocument doc = PDDocument.load(file, MemoryUsageSetting.setupTempFileOnly());
This will setup buffering memory usage to only use temporary files with no restricted size.
来源:https://stackoverflow.com/questions/53551335/java-does-pdfbox-have-an-option-to-open-file-instead-of-loading-it