How can I grab the element by matching text in its attribute in BeautifulSoup

微笑、不失礼 提交于 2020-01-11 11:33:09

问题


I have this code

<a title="Next Page - Results 1 to 60 " href="bla bla" class="smallfont" rel="next">&gt;</a>

I want to grab the a element and get the href .

how can I match the title attribute with Next Page

I want to partially match the text in title attribute of the a element.

There are many a tags on the page similar to it but only difference is that the title attribute contains "Next Page or the text is >.


回答1:


You would have to use Regex for accomplishing what you want.

First take the entire markup as a string and make a BeautifulSoup object with it.

Then use the .findAll method of the BeautifulSoup object as follows

import BeautifulSoup
import re

soup = BeautifulSoup('<a title="Next Page - Results 1 to 60 " href="bla bla" class="smallfont" rel="next">&gt;</a>')

elements = soup.findAll('a', {'title':re.compile('Next Page.')}) 
# get all 'a' elements with 'title' attribute as 'Next Page something' into a list

for e in elements:
    if str(e.string) == '>' or e.string == '&gt;': # check if string inside 'a' tag is '>'
        print e['href']


来源:https://stackoverflow.com/questions/14064186/how-can-i-grab-the-element-by-matching-text-in-its-attribute-in-beautifulsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!