I am new to lxml and want to extract <p>PARAGRAPHS</p>
and <li>PARAGRAPHS</li>
from a given url and use them for further steps.
I followed an example from a post, and tried the following code with no luck:
html = lxml.html('http://www.google.com/intl/en/about/corporate/index.html')
url = 'http://www.google.com/intl/en/about/corporate/index.html'
print html.parse.xpath('//p/text()')
I tried to look into the examples in lxml.html, but didn't find any example using url.
Could you give me any hint on what methods should I use? Thanks.
import lxml.html
htmltree = lxml.html.parse('http://www.google.com/intl/en/about/corporate/index.html')
print htmltree.xpath('//p/text()')
来源:https://stackoverflow.com/questions/7785463/parse-paragraphs-from-html-using-lxml