Parsing non-standard XML (CDATA tag)

前端 未结 2 866
无人共我
无人共我 2020-12-19 10:20

When I want to parsing XML document in Python using BeautifulSoup library, I faced some problems. The XML document that I want to parse:




        
2条回答
  •  不知归路
    2020-12-19 11:18

    You could use BeautifulSoup to parse XML:

    import bs4 as bs
    content='''\
    
    <![CDATA[Title Sample]]>
    
    2011-10-10 09:00:00
    2011-10-17 09:00:00
    35000
    20000
    '''    
    
    soup = bs.BeautifulSoup(content, 'xml')
    
    title = soup.title
    print(title.string)
    # Title Sample
    
    link = soup.link.nextSibling
    print(link)
    # http://banhada.kr/?cateCode=09&viewCode=S0941580
    

    Under the hood, BeautifulSoup uses lxml for parsing XML. Although it's not needed here, you might want to use lxml directly, since it gives you more succinct ways to navigate through XML using XPath:

    import lxml.etree as ET
    
    content='''\
    
    <![CDATA[Title Sample]]>
    
    2011-10-10 09:00:00
    2011-10-17 09:00:00
    35000
    20000
    '''    
    
    doc = ET.fromstring(content)
    
    title = doc.find('title')
    print(title.text)
    # Title Sample
    
    link = doc.find('link')
    print(link.tail)
    # http://banhada.kr/?cateCode=09&viewCode=S0941580
    

提交回复
热议问题