Beautifulsoup - How to get all links inside a block with a certain class?

我的梦境 提交于 2019-12-22 07:51:56

问题


I have the following HTML Dom:

    <div class="meta-info meta-info-wide"> <div class="title">Разработчик</div> <div class="content contains-text-link"> 

<a class="dev-link" href="http://www.jourist.com&amp;sa=D&amp;usg=AFQjCNHiC-nLYHAJwNnvDyYhyoeB6n8YKg" rel="nofollow" target="_blank">Перейти на веб-сайт</a>

 <a class="dev-link" href="mailto:info@jourist.com" rel="nofollow" target="_blank">Написать: info@jourist.com</a> 

 <div class="content physical-address">Diagonalstraße 41
    20537 Hamburg</div> </div> </div>

I need to get all links(url) with class dev-link inside block div.meta-info-wide.

I tried this obvious way, but does not work:

divTag = soup.find_all("div", {"class":"meta-info-wide"})
        print(len(divTag))

        for tag in divTag:
            tdTags = tag.find_all("a", {"class":"dev-link"})
            for tag in tdTags:
                print tag.text

回答1:


Try the following:

import bs4

html = """    
<div class="meta-info meta-info-wide"> <div class="title">Разработчик</div> <div class="content contains-text-link"> 
<a class="dev-link" href="http://www.jourist.com&amp;sa=D&amp;usg=AFQjCNHiC-nLYHAJwNnvDyYhyoeB6n8YKg" rel="nofollow" target="_blank">Перейти на веб-сайт</a>
<a class="dev-link" href="mailto:info@jourist.com" rel="nofollow" target="_blank">Написать: info@jourist.com</a> 
<div class="content physical-address">Diagonalstraße 4120537 Hamburg</div> </div> </div>"""

soup = bs4.BeautifulSoup(html, "html.parser")

for div in soup.find_all("div", {"class":"meta-info-wide"}):
    for link in div.select("a.dev-link"):
        print link['href']

This gives you:

http://www.jourist.com&sa=D&usg=AFQjCNHiC-nLYHAJwNnvDyYhyoeB6n8YKg
mailto:info@jourist.com 

The select() is used to return all a tags which have the class dev-link. This is the recommended method to use when there are two or more CSS classes involved.

Tested with BeautifulSoup 4.5.1, Python 2.7.12



来源:https://stackoverflow.com/questions/41237467/beautifulsoup-how-to-get-all-links-inside-a-block-with-a-certain-class

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!