Need suggestions on how to extract data from .docx/.doc file then into SQL Server

☆樱花仙子☆ 提交于 2019-12-04 12:16:56

The answer is choice #3 - the OpenXML SDK. First let me explain why you don't want the choices listed above.

  1. Running Office on the server is a bad idea. Microsoft specifically says don't do it. It's slow and you will hit "issues" where it throws exceptions or just fails to find things.

  2. Parsing the XML file will work but the XPath to find every possible case where the images, etc. are located adds up. You would probably have to iterate on sections, which come at the end of each section, then handle all cases of in a cell, in a textbox, positioned, inline, etc.

If you go with the OpenXML SDK you have a LINQ interface where you can then use the Descendents and get everything that is an image (or whatever you need). It also gives you sections by the SectPr node so you can easily iterate over sections.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!