Docx4J Generated XLSX file is always corrupt

三世轮回 提交于 2021-02-11 15:19:14

问题


TL;DR: Excel Workbook generated by Docx4J always says corrupted but I can't determine what Excel doesn't like about the underlying XML, let alone how to fix it.

My use case is as follows: I am trying to produce an excel workbook with charts and graphs automatically on a regular basis. Only the raw data will change but everything else will dynamically update as the raw data is changed.

So I built an excel workbook which has a number of charts and graphs being generated by a sheet of raw data. I am using it as a template. All values of the raw data are numeric. The intent was to use Docx4J to read this 'template' and to populate the raw data sheet, then save it as a new file whereupon opening will initiate the recalculation and the charts and graphs will update. Since I am new to Docx4j, I basically decided to do baby steps by first seeing if I could open and read the contents of the cells; which I could. So far so good. I also could change the values of the cells but I could only verify this programatically by writing out to the console the location and value before a change, then the location and value after the change (ex. A1=45 followed by A1=55).

My problem starts when I try to open the resulting file. It generates, looks to be about the right size but Excel claims it is corrupted. It does try to recover what it can, but ultimately fails and the workbook won't even open. For troubleshooting, I opened up the generated xlsx and confirmed all the various XML files that make up an xlsx file were present and readable so I am concluding either something is missing or some part of the XML coming out the other side is not what Excel wants. Further troubleshooting involved creating an empty workbook (no data, 1 sheet) as my 'template', opening it and then saving it back to the file system with a different name and simply trying to see if I could open it in Excel but no dice. This has me ruling out anything to do with my attempts to write or add data to the sheet.

Relevant Environment Information:

  • 'template' workbook is being generated on a Windows 10 64bit machine
  • My docx4j code is executing on a Debian 10 Linux machine running OpenJDK 11.0.4
  • My version of Excel both to create the 'template' and open the copy is Excel for Office365
  • I am running Docx4J v11.1.3 but I also tried with v8.1.5(both cases I had to use the Reference Implementation of JAXB to get around a marshalling error when trying to save)

I did see another post on Stackoverflow here about an issue related to fonts in Linux environments so I made sure to install the MS TT Corefonts but it didn't help my problem.

I ran the entire unzipped directory through BeyondCompare and there are some differences but I don't know which are just artifacts of the two different OS' or even which differences matter. Mostly they are:

  • small differences in file size
  • boolean values showing as "1", "yes", or "true" but not the same way for both files
  • namespaces and attributes in one file but not the other

Sheet1 from my blank workbook, before and after

All ideas are welcome.


回答1:


Please try the just-released docx4j 8.1.6, which fixes handling of xlsx files created by recent releases of Excel. This was https://github.com/plutext/docx4j/issues/389



来源:https://stackoverflow.com/questions/61069158/docx4j-generated-xlsx-file-is-always-corrupt

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!