get all the links of HTML using lxml
问题 I want to find out all the urls and its name from a html page using lxml. I can parse the url and can find out this thing but is there any easy way from which I can find all the url links using lxml? 回答1: from lxml.html import parse dom = parse('http://www.google.com/').getroot() links = dom.cssselect('a') 回答2: from lxml import etree, cssselect, html with open("/you/path/index.html", "r") as f: fileread = f.read() dochtml = html.fromstring(fileread) select = cssselect.CSSSelector("a") links =