AttributeError: 'ResultSet' object has no attribute 'find_all' Beautifulsoup

那年仲夏 提交于 2021-01-28 14:29:56

问题


I dont understand why do i get this error:

I have a fairly simple function:

def scrape_a(url):
  r = requests.get(url)
  soup = BeautifulSoup(r.content)
  news =  soup.find_all("div", attrs={"class": "news"})
  for links in news:
    link = news.find_all("href")
    return link

Here is th estructure of webpage I am trying to scrape:

<div class="news">
<a href="www.link.com">
<h2 class="heading">
heading
</h2>
<div class="teaserImg">
<img alt="" border="0" height="124" src="/image">
</div>
<p> text </p>
</a>
</div>

回答1:


You are doing two things wrong:

  • You are calling find_all on the news result set; presumably you meant to call it on the links object, one element in that result set.

  • There are no <href ...> tags in your document, so searching with find_all('href') is not going to get you anything. You only have tags with an href attribute.

You could correct your code to:

def scrape_a(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.content)
    news =  soup.find_all("div", attrs={"class": "news"})
    for links in news:
        link = links.find_all(href=True)
        return link

to do what I think you tried to do.

I'd use a CSS selector:

def scrape_a(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.content)
    news_links = soup.select("div.news [href]")
    if news_links:
        return news_links[0]

If you wanted to return the value of the href attribute (the link itself), you need to extract that too, of course:

return news_links[0]['href']

If you needed all the link objects, and not the first, simply return news_links for the link objects, or use a list comprehension to extract the URLs:

return [link['href'] for link in news_links]


来源:https://stackoverflow.com/questions/32474842/attributeerror-resultset-object-has-no-attribute-find-all-beautifulsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!