Parse paragraphs from HTML using lxml

心已入冬 提交于 2019-12-08 02:36:05

问题


I am new to lxml and want to extract <p>PARAGRAPHS</p> and <li>PARAGRAPHS</li> from a given url and use them for further steps.

I followed an example from a post, and tried the following code with no luck:

html = lxml.html('http://www.google.com/intl/en/about/corporate/index.html')
url = 'http://www.google.com/intl/en/about/corporate/index.html'
print html.parse.xpath('//p/text()')

I tried to look into the examples in lxml.html, but didn't find any example using url.

Could you give me any hint on what methods should I use? Thanks.


回答1:


import lxml.html

htmltree = lxml.html.parse('http://www.google.com/intl/en/about/corporate/index.html')

print htmltree.xpath('//p/text()')


来源:https://stackoverflow.com/questions/7785463/parse-paragraphs-from-html-using-lxml

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!