Equivalent to InnerHTML when using lxml.html to parse HTML

前端未结

关注

 4  1852

无人共我 2020-12-01 15:52

I\'m working on a script using lxml.html to parse web pages. I have done a fair bit of BeautifulSoup in my time but am now experimenting with lxml due to its speed.

4条回答

鱼传尺愫 (楼主)

2020-12-01 16:19

import lxml.etree as ET

     body = t.xpath("//body");
     for tag in body:
         h = html.fromstring( ET.tostring(tag[0]) ).xpath("//h1");
         p = html.fromstring(  ET.tostring(tag[1]) ).xpath("//p");             
         htext = h[0].text_content();
         ptext = h[0].text_content();

you can also use .get('href') for a tag and .attrib for attribute ,

here tag no is hardcoded but you can also do this dynamic

0 讨论(0)

查看其它4个回答