Equivalent to InnerHTML when using lxml.html to parse HTML

前端 未结 4 1852
无人共我
无人共我 2020-12-01 15:52

I\'m working on a script using lxml.html to parse web pages. I have done a fair bit of BeautifulSoup in my time but am now experimenting with lxml due to its speed.

4条回答
  •  鱼传尺愫
    2020-12-01 16:19

    import lxml.etree as ET
    
         body = t.xpath("//body");
         for tag in body:
             h = html.fromstring( ET.tostring(tag[0]) ).xpath("//h1");
             p = html.fromstring(  ET.tostring(tag[1]) ).xpath("//p");             
             htext = h[0].text_content();
             ptext = h[0].text_content();
    

    you can also use .get('href') for a tag and .attrib for attribute ,

    here tag no is hardcoded but you can also do this dynamic

提交回复
热议问题