xpath expression to remove whitespace

前端 未结 6 1973
野性不改
野性不改 2020-12-08 00:18

I have this HTML:

 
   
     

        
相关标签:
6条回答
  • 2020-12-08 00:30

    I. Use this single XPath expression:

    translate(normalize-space(/tr/td/a), ' ', '')
    

    Explanation:

    1. normalize-space() produces a new string from its argument, in which any leading or trailing white-space (space, tab, NL or CR characters) is deleted and any intermediary white-space is replaced by a single space character.

    2. translate() takes the result produced by normalize-space() and produces a new string in which each of the remaining intermediary spaces is replaced by the empty string.


    II. Alternatively:

    translate(/tr/td/a, ' 	
&#13', '')
    
    0 讨论(0)
  • 2020-12-08 00:43

    Get the inner content of the tags with an xpath-expressen, then use trim() (assuming you're using php) or some equivalent function to cut away any whitespace at the beginning or end.

    0 讨论(0)
  • 2020-12-08 00:45

    I came across this thread when I was having my own issue similar to above.

    HTML

    <div class="d-flex">
    <h4 class="flex-auto min-width-0 pr-2 pb-1 commit-title">
      <a href="/nsomar/OAStackView/releases/tag/1.0.1">
    
        1.0.1
      </a>
    

    XPath start command

    tree.xpath('//div[@class="d-flex"]/h4/a/text()')
    

    However this grabbed random whitespace and gave me the output of:

    ['\n          ', '\n        1.0.1\n      ']
    

    Using normalize-space, it removed the first blank space node and left me with just what I wanted

    tree.xpath('//div[@class="d-flex"]/h4/a/text()[normalize-space()]')
    
    ['\n        1.0.1\n      ']
    

    I could then grab the first element of the list, and use strip() to remove any further whitespace

    XPath final command

    tree.xpath('//div[@class="d-flex"]/h4/a/text()[normalize-space()]')[0].strip()
    

    Which left me with exactly what I required:

    1.0.1
    
    0 讨论(0)
  • 2020-12-08 00:49

    Please try the below xpath expression :

    //td[@class='score-time status']/a[normalize-space() = '16 : 00']
    
    0 讨论(0)
  • 2020-12-08 00:51
    • you can check if text() nodes are empty.

      /path/text()[not(.='')]

    it may be useful with axes like following-sibling:: if these are no containers, or with child::.

    • you can use string() or the regex() function of xpath 2.

    NOTE: some comments say that xpath cannot do string manipulation... even if it's not really designed for that you can do basic things: contains(), starts-with(), replace().

    if you want to check whitespace nodes it's much harder, as you will generally have a nodelist result set, and most xpath functions, like match or replace, only operate one node.

    • you can separate node and string manipulation

    So you may use xpath to retrieve a container, or a list of text nodes, and then process it with another language. (java, php, python, perl for instance).

    0 讨论(0)
  • 2020-12-08 00:52

    You can use XPath's normalize-space() as in //a[normalize-space()="16 : 00"]

    0 讨论(0)
提交回复
热议问题