Conditional XML parsing in Python

a 夏天 提交于 2019-12-11 18:27:20

问题


I would like to select the information of all child elements in very large xml file if its parent has certain information. If, as in the sample code, the attribute of the node sn contains elliptic="yes", then select the v node and retrieve attribute values (e.g. wd="vulgui").

 <sentence>
<sadv arg="argM" func="cc" tem="tmp">
  <sadv>
    <grup.adv>
      <r lem="després" pos="rg" wd="Després"/>
      <sp>
        <prep>
          <s lem="de" pos="sps00" postype="preposition" wd="de"/>
        </prep>
        <sn entityref="nne">
          <spec gen="m" num="p">
            <z lem="15" ne="number" wd="15"/>
          </spec>
          <grup.nom gen="m" num="p">
            <n gen="m" lem="any" num="p" pos="ncmp000" postype="common" sense="16:10917509" wd="anys"/>
            <sp>
              <prep>
                <s lem="de" pos="sps00" postype="preposition" wd="de"/>
              </prep>
              <sn entityref="nne">
                <spec gen="f" num="s">
                  <d coreftype="ident" entity="entity3" entityref="nne" gen="f" lem="el_seu" num="s" person="3" pos="dp3fs0" postype="possessive" wd="la_seva"/>
                </spec>
                <grup.nom gen="f" num="s">
                  <n gen="f" lem="creació" num="s" pos="ncfs000" postype="common" sense="16:00583085" wd="creació"/>
                </grup.nom>
              </sn>
            </sp>
          </grup.nom>
        </sn>
      </sp>
    </grup.adv>
  </sadv>
  <f lem="," pos="fc" punct="comma" wd=","/>
</sadv>
<sn arg="arg0" coreftype="ident" **elliptic="yes"** entity="entity3" entityref="nne" func="suj" tem="agt"/>
<grup.verb>
  <v lem="presentar" lss="A32.ditransitive-patient-benefactive" mood="indicative" num="p" person="3" pos="vmip3p0" postype="main" tense="present" **wd="presenten"**/>
</grup.verb>
<sn arg="arg1" entityref="spec" func="cd" tem="pat">
  <spec gen="m" num="s">
    <d gen="m" lem="un" num="s" pos="di0ms0" postype="indefinite" wd="un"/>
  </spec>
  <grup.nom gen="m" num="s">
    <s.a gen="m" num="s">
      <grup.a gen="m" num="s">
        <a gen="m" lem="nou" num="s" pos="aq0ms0" postype="qualificative" wd="nou"/>
      </grup.a>
    </s.a>
    <n gen="m" lem="disc" num="s" pos="ncms000" postype="common" sense="16:03112307" wd="disc"/>
    <sn entityref="ne" ne="other">
      <f lem="," pos="fc" punct="comma" wd=","/>
      <grup.nom>
        <f lem="'" pos="fz" punct="mathsign" wd="'"/>
        <n lem="Electroretard" ne="other" pos="np0000a" postype="proper" sense="16:cs1" wd="Electroretard"/>
        <f lem="'" pos="fz" punct="mathsign" wd="'"/>
      </grup.nom>
    </sn>
  </grup.nom>
</sn>
<f lem="." pos="fp" punct="period" wd="."/>

I couldn't come up with a solution after:

for sn in root.iter('sn'):
rank = sn.get('elliptic')
if rank == 'yes':

How could I continue this line of code? I thought something like:

"iterate through all children whose parents contain @elliptic="yes"


回答1:


Well as I understand the simplest way is to build xpath and put it in try ->if/except block:

xpath = '(//sn[@elliptic="yes"])[1]'

Now create a if statement that would check if this element is in you xml group and if it exists, then do what you need. E.g. if this true, then use another xpath's or etc to extract what is needed.

p.s. this [1] means that you are searching for 1st element in xml, if there is more then 1 then without it, it can break. So create iterator i that would go in your xpath (//sn[@elliptic="yes"])[i]



来源:https://stackoverflow.com/questions/49792703/conditional-xml-parsing-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!