How to parse mathML in output of WordOpenXML?

心已入冬 提交于 2019-12-09 13:13:28

问题


I want to read only the xml used for generating equation, which i obtained by using Paragraph.Range.WordOpenXML. But the section used for the equation is not as per MathML which as i found that the Equation of microsoft is in MathML.

Do I need to use some special converter to get desired xmls or are there any other methods?


回答1:


You could use the OMML2MML.XSL file (located under %ProgramFiles%\Microsoft Office\Office15) to transform Microsoft Office MathML (equations) included in a word document into MathML.

The code below shows how to transform the equations in a word document into MathML using the following steps:

  1. Open the word document using OpenXML SDK (version 2.5).
  2. Create a XslCompiledTransform and load the OMML2MML.XSL file.
  3. Transform the word document by calling the Transform() method on the created XslCompiledTransform instance.
  4. Output the result of the transform (e.g. print on console or write to file).

I've tested the code below with a simple word document containing two equations, text and pictures.

using System.IO;
using System.Xml;
using System.Xml.Xsl;
using DocumentFormat.OpenXml.Packaging;

public string GetWordDocumentAsMathML(string docFilePath, string officeVersion = "14")
{
    string officeML = string.Empty;
    using (WordprocessingDocument doc = WordprocessingDocument.Open(docFilePath, false))
    {
        string wordDocXml = doc.MainDocumentPart.Document.OuterXml;

        XslCompiledTransform xslTransform = new XslCompiledTransform();

        // The OMML2MML.xsl file is located under 
        // %ProgramFiles%\Microsoft Office\Office15\
        xslTransform.Load(@"c:\Program Files\Microsoft Office\Office" + officeVersion + @"\OMML2MML.XSL");

        using (TextReader tr = new StringReader(wordDocXml))
        {
            // Load the xml of your main document part.
            using (XmlReader reader = XmlReader.Create(tr))
            {
                using (MemoryStream ms = new MemoryStream())
                {
                    XmlWriterSettings settings = xslTransform.OutputSettings.Clone();

                    // Configure xml writer to omit xml declaration.
                    settings.ConformanceLevel = ConformanceLevel.Fragment;
                    settings.OmitXmlDeclaration = true;

                    XmlWriter xw = XmlWriter.Create(ms, settings);

                    // Transform our OfficeMathML to MathML.
                    xslTransform.Transform(reader, xw);
                    ms.Seek(0, SeekOrigin.Begin);

                    using (StreamReader sr = new StreamReader(ms, Encoding.UTF8))
                    {
                        officeML = sr.ReadToEnd();
                        // Console.Out.WriteLine(officeML);
                    }
                }
            }
        }
    }
    return officeML;
}

To convert only one single equation (and not the whole word document) just query for the desired Office Math Paragraph (m:oMathPara) and use the OuterXML property of this node. The code below shows how to query for the first math paragraph:

string mathParagraphXml = 
      doc.MainDocumentPart.Document.Descendants<DocumentFormat.OpenXml.Math.Paragraph>().First().OuterXml;

Use the returned XML to feed the TextReader.



来源:https://stackoverflow.com/questions/16759100/how-to-parse-mathml-in-output-of-wordopenxml

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!