How to use regular expression in lxml xpath?

我只是一个虾纸丫 提交于 2019-12-17 07:13:42

问题


I'm using construction like this:

doc = parse(url).getroot()
links = doc.xpath("//a[text()='some text']")

But I need to select all links which have text beginning with "some text", so I'm wondering is there any way to use regexp here? Didn't find anything in lxml documentation


回答1:


You can do this (although you don't need regular expressions for the example). Lxml supports regular expressions from the EXSLT extension functions. (see the lxml docs for the XPath class, but it also works for the xpath() method)

doc.xpath("//a[re:match(text(), 'some text')]", 
        namespaces={"re": "http://exslt.org/regular-expressions"})

Note that you need to give the namespace mapping, so that it knows what the "re" prefix in the xpath expression stands for.




回答2:


You can use the starts-with() function:

doc.xpath("//a[starts-with(text(),'some text')]")



回答3:


Because I can't stand lxml's approach to namespaces, I wrote a little method that you cam bind to the HtmlElement class.

Just import HtmlElement:

from lxml.etree import HtmlElement

Then put this in your file:

# Patch the HtmlElement class to add a function that can handle regular
# expressions within XPath queries.
def re_xpath(self, path):
    return self.xpath(path, namespaces={
        're': 'http://exslt.org/regular-expressions'})
HtmlElement.re_xpath = re_xpath

And then when you want to make a regular expression query, just do:

my_node.re_xpath("//a[re:match(text(), 'some text')]")

And you're off to the races. With a little more work, you could probably modify this to replace the xpath method itself, but I haven't bothered since this is working well enough.



来源:https://stackoverflow.com/questions/2755950/how-to-use-regular-expression-in-lxml-xpath

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!