发表新帖

发表新帖

How to use regular expression in lxml xpath?

后端未结

关注

 3  684

伪装坚强ぢ 2020-11-30 00:47

I\'m using construction like this:

doc = parse(url).getroot()
links = doc.xpath(\"//a[text()=\'some text\']\")

But I need to select all lin

3条回答

旧时难觅i (楼主)

2020-11-30 01:12
Because I can't stand lxml's approach to namespaces, I wrote a little method that you cam bind to the HtmlElement class.

Just import HtmlElement:
```
from lxml.etree import HtmlElement
```
Then put this in your file:
```
# Patch the HtmlElement class to add a function that can handle regular
# expressions within XPath queries.
def re_xpath(self, path):
    return self.xpath(path, namespaces={
        're': 'http://exslt.org/regular-expressions'})
HtmlElement.re_xpath = re_xpath
```
And then when you want to make a regular expression query, just do:
```
my_node.re_xpath("//a[re:match(text(), 'some text')]")
```
And you're off to the races. With a little more work, you could probably modify this to replace the xpath method itself, but I haven't bothered since this is working well enough.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题