How to use regular expression in lxml xpath?

后端 未结 3 684
伪装坚强ぢ
伪装坚强ぢ 2020-11-30 00:47

I\'m using construction like this:

doc = parse(url).getroot()
links = doc.xpath(\"//a[text()=\'some text\']\")

But I need to select all lin

3条回答
  •  旧时难觅i
    2020-11-30 01:12

    Because I can't stand lxml's approach to namespaces, I wrote a little method that you cam bind to the HtmlElement class.

    Just import HtmlElement:

    from lxml.etree import HtmlElement
    

    Then put this in your file:

    # Patch the HtmlElement class to add a function that can handle regular
    # expressions within XPath queries.
    def re_xpath(self, path):
        return self.xpath(path, namespaces={
            're': 'http://exslt.org/regular-expressions'})
    HtmlElement.re_xpath = re_xpath
    

    And then when you want to make a regular expression query, just do:

    my_node.re_xpath("//a[re:match(text(), 'some text')]")
    

    And you're off to the races. With a little more work, you could probably modify this to replace the xpath method itself, but I haven't bothered since this is working well enough.

提交回复
热议问题