Get second element text with XPath?

前端 未结 3 1170
感情败类
感情败类 2020-12-10 11:13

  google
  chrome

I want to get chrome and have it w

相关标签:
3条回答
  • 2020-12-10 11:41

    I'm not sure what the problem is...

    >>> d = """<span class='python'>
    ...   <a>google</a>
    ...   <a>chrome</a>
    ... </span>"""
    >>> from lxml import etree
    >>> d = etree.HTML(d)
    >>> d.xpath('.//span[@class="python"]/a[2]/text()')
    ['chrome']
    >>>
    
    0 讨论(0)
  • 2020-12-10 11:46

    From Comments:

    or the simplification of the actual HTML I posted is too simple

    You are right. What is the meaning of .//span[@class="python"]//a[2]? This will be expanded to:

    self::node()
     /descendant-or-self::node()
      /child::span[attribute::class="python"]
       /descendant-or-self::node()
        /child::a[position()=2]
    

    It will finaly select the second a child (fn:position() refers to the child axe). So, nothing will be select if your document is like:

    <span class='python'> 
      <span> 
        <span> 
          <img></img> 
          <a>google</a><!-- This is the first "a" child of its parent --> 
        </span> 
        <a>chrome</a><!-- This is also the first "a" child of its parent --> 
      </span> 
    </span> 
    

    If you want the second of all descendants, use:

    descendant::span[@class="python"]/descendant::a[2]
    
    0 讨论(0)
  • 2020-12-10 11:49

    I tried this but it doesn't work.

    t = item.findtext('.//span[@class="python"]//a[2]')
    

    This is a FAQ about the // abbreviation.

    .//a[2] means: Select all a descendents of the current node that are the second a child of their parent. So this may select more than one element or no element -- depending on the concrete XML document.

    To put it more simply, the [] operator has higher precedence than //.

    If you want just one (the second) of all nodes returned you have to use brackets to force your wanted precedence:

    (.//a)[2]

    This really selects the second a descendent of the current node.

    For the actual expression used in the question, change it to:

    (.//span[@class="python"]//a)[2]
    

    or change it to:

    (.//span[@class="python"]//a)[2]/text()
    
    0 讨论(0)
提交回复
热议问题