How to find out what makes poi corrupt a xlsx / xlsm file

有些话、适合烂在心里 提交于 2020-01-03 02:48:25

问题


I have the issue that Apache POI "corrupted" a xlsm / xlsx file by just reading and writing it (e.g. with the following code)

public class Snippet {
    public static void main(String[] args) throws Exception {

        String str1 = "c:/tmp/spreadsheet.xlsm";
        String str2 = "c:/tmp/spreadsheet_poi.xlsm";

        // open file
        XSSFWorkbook wb = new XSSFWorkbook(new FileInputStream(new File(str1)));

        // save file
        FileOutputStream out = new FileOutputStream(str2);
        wb.write(out);
        wb.close();
        out.close();

    }
}

Once you open the spreadsheet_poi.xlsm in Excel you'll get an error like the following

"We found a problem with some content in xxx. Do you want us to try to recover as much as we can..."?

If you say yes you'll end up with a log which could look like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
    <logFileName>error145040_01.xml</logFileName>
    <summary>Errors were detected in file 'C:\tmp\spreadsheet_poi.xlsm'</summary>
    <repairedParts>
        <repairedPart>Repaired Part: /xl/worksheets/sheet4.xml part with XML error.  Load error. Line 2, column 0.</repairedPart>
        <repairedPart>Repaired Part: /xl/worksheets/sheet5.xml part with XML error.  Load error. Line 2, column 0.</repairedPart>
        <repairedPart>Repaired Part: /xl/worksheets/sheet8.xml part with XML error.  Load error. Line 2, column 0.</repairedPart>
    </repairedParts>
</recoveryLog>

Whats the best approach to debug the issue in more detail (e.g. find out what makes poi to "corrupt" the file?


回答1:


Eventually I found how that the best approach for debugging this are two things

  1. open the affected workbook (e.g. with 7zip and format the affected sheets with an xml editor (e.g. Notepad++ > Plugins > XML Tools > Pretty print (XML only - with line breaks). After saving the files and updating the xlsm file you'll get the "real" line numbers in the Excel error log
  2. if the real line numbers not already help compare the sheet xml files of the original xlsx file and the one saved by poi. You'll notice that there are differences in regards to the attributes and also the order is different. In order to properly compare I used Beyond Compare with "Additional File Formats" (see https://weblogs.asp.net/lorenh/comparing-xml-files-with-beyond-compare-3-brilliant for more information). Maybe there is another diff tool that is equally good.

In my case the problem was that poi somehow changed the dimension setting from

<dimension ref="A1:XFD147"/>

to

<dimension ref="A1:XFE147"/>

(with XFE beeing a non existing column). I fixed it by removing those many empty columns in the original xlsx file.




回答2:


My professor said: "How does the mathematician find the lion in the desert?" - "First cuts the desert into two halves, finds out where is the lion, then repeats it until the lion is caught".

So, try to remove features from the Excel files, try different versions, until you find the root cause. There may be multiple causes, though.



来源:https://stackoverflow.com/questions/54077244/how-to-find-out-what-makes-poi-corrupt-a-xlsx-xlsm-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!