问题
I'm working on setting up an automated processing system for a (ever growing) unstructured collection of excel documents. The collection consists of both old-school .xls
files and new .xlsx
files. In my Java-based solution I am already making use of the Apache POI toolkit to analyse the documents.
One challenges that I have not been able to tackle yet, is how to identify links between documents so as to chart dependencies. I have not yet been able to figure out how to conveniently extract a list of external references. For .xlsx
files I have a workaround in place that unzips the file, and opens the xml file holding the references. This works but is inefficient for large document collections, and also does not provide a solution for .xls
files.
I prefer to have a solution that is not dependent on Microsoft Office or associated libraries as the solution needs to run on a Linux environment.
Is POI capable of doing this somehow? If not, what would be suggested libraries/tools/area's that I could further investigate?
回答1:
Ultimately I worked my way through the POI source code and used reflection to get a list of referenced external workbooks. The following code was tested to work on POI version 3.11 beta.
Note for people looking to use this method in there code: Because it deals with non-public methods and classes, it is subject to change and may break in the future.
private LinkedList<String> getWorkbookReferences(HSSFWorkbook wb) {
LinkedList<String> references = new LinkedList<>();
try {
// 1. Get InternalWorkbook
Field internalWorkbookField = HSSFWorkbook.class.getDeclaredField("workbook");
internalWorkbookField.setAccessible(true);
InternalWorkbook internalWorkbook = (InternalWorkbook) internalWorkbookField.get(wb);
// 2. Get LinkTable (hidden class)
Method getLinkTableMethod;
getLinkTableMethod = InternalWorkbook.class.getDeclaredMethod("getOrCreateLinkTable", null);
getLinkTableMethod.setAccessible(true);
Object linkTable = getLinkTableMethod.invoke(internalWorkbook, null);
// 3. Get external books method
Method externalBooksMethod = linkTable.getClass().getDeclaredMethod("getExternalBookAndSheetName", int.class);
externalBooksMethod.setAccessible(true);
// 4. Loop over all possible workbooks
int i = 0;
String[] names;
try {
while( true) {
names = (String[]) externalBooksMethod.invoke(linkTable, i++) ; if (names != null ) {
references.add(names[0]);
}
}
}
catch ( java.lang.reflect.InvocationTargetException e) {
if ( !(e.getCause() instanceof java.lang.IndexOutOfBoundsException) ) {
throw e;
}
}
} catch (NoSuchFieldException | NoSuchMethodException | SecurityException | InvocationTargetException | IllegalAccessException e) {
e.printStackTrace();
}
return references;
}
来源:https://stackoverflow.com/questions/26758099/how-to-extract-a-list-of-external-references-from-a-excel-file