BeautifulSoup not working, getting NoneType error

问题

I am using the following code (Taken from retrieve links from web page using python and BeautifulSoup):

import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer

http = httplib2.Http()
status, response = http.request('http://www.nytimes.com')

for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
    if link.has_attr('href'):
        print link['href']

However, I don't understand why I am getting the following error message:

Traceback (most recent call last):
  File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 13, in <module>
    if link.has_attr('href'):
TypeError: 'NoneType' object is not callable

BeautifulSoup 3.2.0 Python 2.7

EDIT:

I tried the solution available for the similar question(Type error if link.has_attr('href'): TypeError: 'NoneType' object is not callable), but it is giving me following error:

Traceback (most recent call last):
  File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 12, in <module>
    for link in BeautifulSoup(response).find_all('a', href=True):
TypeError: 'NoneType' object is not callable

回答1:

First of all:

from BeautifulSoup import BeautifulSoup, SoupStrainer

You are using BeautifulSoup version 3 which is no longer maintained. Switch to BeautifulSoup version 4. Install it via:

pip install beautifulsoup4

and change your import to:

from bs4 import BeautifulSoup

Also:

Traceback (most recent call last): File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 13, in if link.has_attr('href'): TypeError: 'NoneType' object is not callable

Here link is a Tag instance which does not have an has_attr method. This means that, remembering what a dot notation means in BeautifulSoup, it would try to search for element has_attr inside the link element which results into nothing found. In other words, link.has_attr is None and obviously None('href') results into an error.

Instead, do:

soup = BeautifulSoup(response, parse_only=SoupStrainer('a', href=True))
for link in soup.find_all("a", href=True):
    print(link['href'])

FYI, here is a complete working code that I used to debug your problem (using requests):

import requests
from bs4 import BeautifulSoup, SoupStrainer


response = requests.get('http://www.nytimes.com').content
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a', href=True)).find_all("a", href=True):
    print(link['href'])

来源：https://stackoverflow.com/questions/35733853/beautifulsoup-not-working-getting-nonetype-error

标签

python

html

python-3.x

beautifulsoup

html-parsing