Reading XML using Python minidom and iterating over each node

后端未结

关注

 5  1642

I have an XML structure that looks like the following, but on a much larger scale:


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  天涯浪人        
                
              
                            
                2020-12-08 10:57
              
            
            
                                                                       
your authortext is of type 1 (ELEMENT_NODE), normally you need to have TEXT_NODE to get a string. This will work

a.childNodes[0].nodeValue

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  悲&欢浪女        
                
              
                            
                2020-12-08 11:08
              
            
            
                                                                       
I played around with it a bit, and here's what I got to work:

# ...
authortext= a.childNodes[0].nodeValue
print authortext


leading to output of:

C:\temp\py>xml2.py
1
Bob
Nigel
2
Alice
Mary


I can't tell you exactly why you have to access the childNode to get the inner text, but at least that's what you were looking for.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  悲&欢浪女        
                
              
                            
                2020-12-08 11:14
              
            
            
                                                                       
Element nodes don't have a nodeValue. You have to look at the Text nodes inside them. If you know there's always one text node inside you can say element.firstChild.data (data is the same as nodeValue for text nodes).

Be careful: if there is no text content there will be no child Text nodes and element.firstChild will be null, causing the .data access to fail.

Quick way to get the content of direct child text nodes:

text= ''.join(child.data for child in element.childNodes if child.nodeType==child.TEXT_NODE)


In DOM Level 3 Core you get the textContent property you can use to get text from inside an Element recursively, but minidom doesn't support this (some other Python DOM implementations do).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  清歌不尽        
                
              
                            
                2020-12-08 11:14
              
            
            
                                                                       
Quick access:

node.getElementsByTagName('author')[0].childNodes[0].nodeValue

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  天命终不由人        
                
              
                            
                2020-12-08 11:18
              
            
            
                                                                       
Since you always have one text data value per author you can use element.firstChild.data

dom = parseString(document)
conferences = dom.getElementsByTagName("conference")

# Each conference here is a node
for conference in conferences:
    conference_name = conference.getAttribute("name")
    print 
    print conference_name.upper() + " - "

    authors = conference.getElementsByTagName("author")
    for author in authors:
        print "  ", author.firstChild.data
    # for

    print

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复