I\'m using construction like this:
doc = parse(url).getroot()
links = doc.xpath(\"//a[text()=\'some text\']\")
But I need to select all lin
Because I can't stand lxml's approach to namespaces, I wrote a little method that you cam bind to the HtmlElement class.
Just import HtmlElement:
from lxml.etree import HtmlElement
Then put this in your file:
# Patch the HtmlElement class to add a function that can handle regular
# expressions within XPath queries.
def re_xpath(self, path):
return self.xpath(path, namespaces={
're': 'http://exslt.org/regular-expressions'})
HtmlElement.re_xpath = re_xpath
And then when you want to make a regular expression query, just do:
my_node.re_xpath("//a[re:match(text(), 'some text')]")
And you're off to the races. With a little more work, you could probably modify this to replace the xpath method itself, but I haven't bothered since this is working well enough.