C# LINQ xml parsing using “PreviousNode”

南笙酒味 提交于 2019-12-12 14:24:46

问题


With quite some help from SO, I managed to put together the following LINQ expression.

var parentids = xliff.Descendants()
                     .Elements(xmlns + "trans-unit")
                     .Elements(xmlns + "seg-source")
                     .Elements(xmlns + "mrk")
                     .Where(e => e.Attribute("mtype").Value == "seg")
                     .Select(item => (XElement)item.Parent.Parent.PreviousNode)
                         .Where(item => item != null)
                         .Select(item => item.Elements(xmlns + "source")
                             .Where(itema => itema != null)
                             .Select(itemb => itemb.Elements(xmlns + "x")             
                             .LastOrDefault()
                             .Attribute("id")
                             .Value.ToString())).ToArray();

What it does is that it locates a mrk tag (that has @mtype="seg") and then it goes up to the trans-unit ancestor (.parent.parent) and checks if the previous sibling trans-unit has a child trans and if not, it returns from the source child the @id of the last x element, otherwise it returns null (it must return null, cannot just not return match).

I need to add that while the below samples only have one such previous node with no trans element, in the real life xml there are many more, so I must use PreviousNode.

Here is the XML it works with, and returns "2" perfectly:

<?xml version="1.0" encoding="utf-8"?>
<xliff xmlns:sdl="http://sdl.com/FileTypes/SdlXliff/1.0" version="1.2" sdl:version="1.0" xmlns="urn:oasis:names:tc:xliff:document:1.2">
  <file original="Pasadena_Internet_2016.xml" source-language="en-US" datatype="x-sdlfilterframework2" target-language="da-DK">
    <body>
      <trans-unit id="d679cb2d-ecba-47ba-acb7-1bb4a798c755" translate="no">
        <source>
          <x id="0" />
          <x id="1" />
          <x id="2" />
        </source>
      </trans-unit>
      <trans-unit id="aed9fde2-fd1b-4eba-bfc9-06d325aa7047">
        <source>
          <x id="3" />Pasadena, California’s iconic Colorado Boulevard <x id="4" />has been the site of the world-famous Tournament of Roses Parade since it began in 1890.
        </source>
        <seg-source>
          <mrk mtype="seg" mid="1">
            <x id="3" />Pasadena, California’s iconic Colorado Boulevard <x id="4" />has been the site of the world-famous Tournament of Roses Parade since it began in 1890.
          </mrk>
        </seg-source>
        <target>
          <mrk mtype="seg" mid="1">
            <x id="3" /><x id="4" />Pasadena, Californiens ikoniske Colorado Boulevard har været stedet for den verdensberømte Rose Bowl-parade siden den begyndte i 1890.
          </mrk>
        </target>
      </trans-unit>
    </body>
  </file>
</xliff>

The problem is that I need to solve as a last step is that there is another type of XML that has the staring trans-unit encapsulated within another group element that is not present in the other XML. So here there is one more parent to jump upwards and get the previous trans-unit sibling, right before the group.

I am trying to build this into the same LINQ expression so it handles both scenarios.

In fact if I modify the line 6 to this, then it works:

.Select(item => (XElement)item.Parent.Parent.Parent.PreviousNode)
<!--                                        ^------ additional Parent --> 

Here is the other XML that right now throws an exception with the above code, but it should return "0":

<?xml version="1.0" encoding="utf-8"?>
<xliff xmlns:sdl="http://sdl.com/FileTypes/SdlXliff/1.0" xmlns="urn:oasis:names:tc:xliff:document:1.2" version="1.2" sdl:version="1.0">
  <file original="Internet_Anti-DrugIntro2015.xml_1457007.xlf" datatype="x-sdlfilterframework2" source-language="en-US" target-language="hu-HU">
    <body>
      <trans-unit translate="no" id="c3a13bfb-ed51-49cf-8278-e2c86c2114c0">
        <source>
          <x id="0"/>
        </source>
      </trans-unit>
      <group>
        <sdl:cxts>
          <sdl:cxt id="1"/>
        </sdl:cxts>
        <trans-unit id="3b4520df-4483-4c9e-8a9b-ce2544269f3e">
          <source>
            <x id="1"/>
          </source>
          <seg-source>
            <mrk mtype="seg" mid="2">
              <x id="1"/>Drugs are robbing our children of their future.
            </mrk>
            <mrk mtype="seg" mid="3">
              <x id="2"/>Every 17 seconds a teenager experiments with an illicit drug for the first time.
            </mrk>
          </seg-source>
          <target>
            <mrk mtype="seg" mid="2">
              <x id="1"/>A drogok megfosztják gyermekeinket a jövőjüktől.
            </mrk>
            <mrk mtype="seg" mid="3">
              <x id="2"/>17 másodpercenként egy újabb tizenéves próbálja ki először a kábítószereket.
            </mrk>
          </target>
        </trans-unit>
      </group>
      <trans-unit translate="no" id="7890462c-edcb-4fe6-9192-033ba76d9942">
        <source>
          <x id="183"/>
        </source>
      </trans-unit>
    </body>
  </file>
</xliff>

I will be more than appreciative for any help.


回答1:


Instead of navigating up the XML tree using Parent several times depending on the XML structure, you can try using Ancestors().Last() to find the highest level ancestor named either "trans-unit" or "group", and then navigate to the previous node.

Try to replace this part :

.Select(item => (XElement) item.Parent.Parent.PreviousNode)

with this one :

.Select(item => (XElement)item.Ancestors()
                              .Last(o => new[]{"trans-unit","group"}.Contains(o.Name.LocalName))
                              .PreviousNode)


来源:https://stackoverflow.com/questions/36927046/c-sharp-linq-xml-parsing-using-previousnode

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!