xpath expression to remove whitespace

前端未结

关注

 6  1973

I have this HTML:


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  难免孤独        
                
              
                            
                2020-12-08 00:30
              
            
            
                                                                       
I. Use this single XPath expression:

translate(normalize-space(/tr/td/a), ' ', '')


Explanation:


normalize-space() produces a new string from its argument, in which any leading or trailing white-space (space, tab, NL or CR characters) is deleted and any intermediary white-space is replaced by a single space character.
translate() takes the result produced by normalize-space() and produces a new string in which each of the remaining intermediary spaces is replaced by the empty string.




II. Alternatively:

translate(/tr/td/a, ' &#9;&#10;&#13', '')

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  陌清茗        
                
              
                            
                2020-12-08 00:43
              
            
            
                                                                       
Get the inner content of the  tags with an xpath-expressen, then use trim() (assuming you're using php) or some equivalent function to cut away any whitespace at the beginning or end.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  执念已碎        
                
              
                            
                2020-12-08 00:45
              
            
            
                                                                       
I came across this thread when I was having my own issue similar to above.

HTML

<div class="d-flex">
<h4 class="flex-auto min-width-0 pr-2 pb-1 commit-title">
  <a href="/nsomar/OAStackView/releases/tag/1.0.1">

    1.0.1
  </a>


XPath start command

tree.xpath('//div[@class="d-flex"]/h4/a/text()')


However this grabbed random whitespace and gave me the output of:

['\n          ', '\n        1.0.1\n      ']


Using normalize-space, it removed the first blank space node and left me with just what I wanted

tree.xpath('//div[@class="d-flex"]/h4/a/text()[normalize-space()]')

['\n        1.0.1\n      ']


I could then grab the first element of the list, and use strip() to remove any further whitespace

XPath final command

tree.xpath('//div[@class="d-flex"]/h4/a/text()[normalize-space()]')[0].strip()


Which left me with exactly what I required:

1.0.1

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  爱一瞬间的悲伤        
                
              
                            
                2020-12-08 00:49
              
            
            
                                                                       
Please try the below xpath expression :

//td[@class='score-time status']/a[normalize-space() = '16 : 00']

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  小鲜肉        
                
              
                            
                2020-12-08 00:51
              
            
            
                                                                       

you can check if text() nodes are empty.

/path/text()[not(.='')]


it may be useful with axes like following-sibling:: if these are no containers, or with child::.


you can use string() or the regex() function of xpath 2.


NOTE: some comments say that xpath cannot do string manipulation... even if it's not really designed for that you can do basic things: contains(), starts-with(), replace().

if you want to check whitespace nodes it's much harder, as you will generally have a nodelist result set, and most xpath functions, like match or replace, only operate one node.


you can separate node and string manipulation


So you may use xpath to retrieve a container, or a list of text nodes, and then process it with another language. (java, php, python, perl for instance).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  挽巷        
                
              
                            
                2020-12-08 00:52
              
            
            
                                                                       
You can use XPath's normalize-space() as in //a[normalize-space()="16 : 00"]
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复