How can I query a Word docx in an ASP.NET app?

时间秒杀一切 提交于 2019-12-04 23:45:22

问题


I would like to upload a Word 2007 or greater docx file to my web server and convert the table of contents to a simple xml structure. Doing this on the desktop with traditional VBA seems like it would have been easy. Looking at the WordprocessingML XML data used to create the docx file is confusing. Is there a way (without COM) to navigate the document in more of an object-oriented fashion?


回答1:


I highly recommend looking into the Open XML SDK 2.0. It's a CTP, but I've found it extremely useful in manipulating xmlx files without having to deal with COM at all. The documentation is a bit sketchy, but the key thing to look for is the DocumentFormat.OpenXml.Packaging.WordprocessingDocument class. You can pick apart the .docx document if you rename the extension to .zip and dig into the XML files there. From doing that, it looks like a Table of Contents is contained in a "Structured Document" tag and that things like the headings are in a hyperlink from there. Putzing around with it a bit, I found that something like this should work (or at least give you a starting point).

WordprocessingDocument wordDoc = WordprocessingDocument.Open(Filename, false);
SdtBlock contents = wordDoc.MainDocumentPart.Document.Descendants<SdtBlock>().First();
List<string> contentList = new List<string>();
foreach (Hyperlink section in contents.Descendants<Hyperlink>())
{
    contentList.Add(section.Descendants<Text>().First().Text);
}



回答2:


Here is a blog post on querying Open XML WordprocessingML documents using LINQ to XML. Using that code, you can write a query as follows:

using (WordprocessingDocument doc =
    WordprocessingDocument.Open(filename, false))
{
    foreach (var p in doc.MainDocumentPart.Paragraphs())
    {
        Console.WriteLine("Style: {0}   Text: >{1}<",
            p.StyleName.PadRight(16), p.Text);
        foreach (var c in p.Comments())
            Console.WriteLine(
              "  Comment Author:{0}  Text:>{1}<",
              c.Author, c.Text);
    }
}

Blog post: Open XML SDK and LINQ to XML

-Eric




回答3:


See XML Documents and Data as a starting point. In particular, you'll want to use LINQ to XML.

In general, you do not want to use COM in a .NET application.



来源:https://stackoverflow.com/questions/1296743/how-can-i-query-a-word-docx-in-an-asp-net-app

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!