extract text from xml documents in python

前端 未结 3 1399
挽巷
挽巷 2020-12-20 09:58

This is the sample xml document :


    
        Everyday Italian</title>>         
<script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<ins class="adsbygoogle"
     style="display:block"
     data-ad-client="ca-pub-5408099190056760"
     data-ad-slot="7305827575"
     data-ad-format="auto"
     data-full-width-responsive="true"></ins>
<script>
     (adsbygoogle = window.adsbygoogle || []).push({});
</script>        </div>
      </div>
      
      <div class="fly-panel detail-box" id="flyReply">
        <fieldset class="layui-elem-field layui-field-title" style="text-align: center;">
          <legend>3条回答</legend>        </fieldset>

        <ul class="jieda" id="jieda">
                    <li data-id="111" class="jieda-daan">
            <a name="item-1111111111"></a>
            <div class="detail-about detail-about-reply">
                         <a class="fly-avatar" href="">
                <img src="https://www.e-learn.cn/qa/data/avatar/000/00/01/small_000000122.jpg" alt=" 悲哀的现实 ">
              </a>
              <div class="fly-detail-user">
                <a href="" class="fly-link">
                  <cite> 悲哀的现实</cite>
                                             
                </a>
                
                <span>(楼主)</span>
            
              </div>              <div class="detail-hits">
                <span>2020-12-20 10:33</span>
              </div>

            </div>
            <div class="detail-body jieda-body photos">
              <p>          
<p>Using the lxml library with an xpath query is possible:</p>

<pre><code>xml="""<bookstore>
    <book category="COOKING">
        <title lang="english">Everyday Italian
        Giada De Laurentiis
        2005
        300.00
    

    
        Harry Potter
        J K. Rowling 
        2005
        625.00
    

"""
from lxml import etree
root = etree.fromstring(xml).getroot()
root.xpath('/bookstore/book/*/text()')
# ['Everyday Italian', 'Giada De Laurentiis', '2005', '300.00', 'Harry Potter', 'J K. Rowling ', '2005', '625.00']

Although you don't get the category....

提交回复
热议问题