问题
I am trying to use Beautiful Soup to find all <a> elements where the href attribute includes a certain string.
An example of the full element is:
<a href="/markets/NZSX/securities/ABA">ABA</a>
I am looking for all elements where href includes "/markets/NZSX/securities/".
I am looking to extract the text from this element. This would be ABA in the example.
回答1:
There are several ways to achieve that. With .find_all():
soup.find_all("a", href=re.compile(r"^/markets/NZSX/securities/"))
soup.find_all("a", href=lambda href: href and href.startswith("/markets/NZSX/securities/"))
Or, with a CSS selector:
soup.select('a[href^="/markets/NZSX/securities/"]')
The above would check for the href to start with /markets/NZSX/securities/. If you want apply the "contains" check instead:
soup.find_all("a", href=re.compile(r"/markets/NZSX/securities/"))
soup.find_all("a", href=lambda href: href and "/markets/NZSX/securities/" in href)
soup.select('a[href*="/markets/NZSX/securities/"]')
来源:https://stackoverflow.com/questions/34759129/finding-partial-matches-in-an-href-tag