get all the links of HTML using lxml

对着背影说爱祢 提交于 2019-12-09 18:48:05

问题


I want to find out all the urls and its name from a html page using lxml.

I can parse the url and can find out this thing but is there any easy way from which I can find all the url links using lxml?


回答1:


from lxml.html import parse
dom = parse('http://www.google.com/').getroot()
links = dom.cssselect('a')



回答2:


from lxml import etree, cssselect, html

with open("/you/path/index.html", "r") as f:
    fileread = f.read()

dochtml = html.fromstring(fileread)

select = cssselect.CSSSelector("a")
links = [ el.get('href') for el in select(dochtml) ]

links = iter(links)
for n, l in enumerate(links):
    print n, l


来源:https://stackoverflow.com/questions/10383383/get-all-the-links-of-html-using-lxml

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!