问题
I am using the following code (Taken from retrieve links from web page using python and BeautifulSoup):
import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer
http = httplib2.Http()
status, response = http.request('http://www.nytimes.com')
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
if link.has_attr('href'):
print link['href']
However, I don't understand why I am getting the following error message:
Traceback (most recent call last):
File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 13, in <module>
if link.has_attr('href'):
TypeError: 'NoneType' object is not callable
BeautifulSoup 3.2.0 Python 2.7
EDIT:
I tried the solution available for the similar question(Type error if link.has_attr('href'): TypeError: 'NoneType' object is not callable), but it is giving me following error:
Traceback (most recent call last):
File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 12, in <module>
for link in BeautifulSoup(response).find_all('a', href=True):
TypeError: 'NoneType' object is not callable
回答1:
First of all:
from BeautifulSoup import BeautifulSoup, SoupStrainer
You are using BeautifulSoup version 3 which is no longer maintained. Switch to BeautifulSoup version 4. Install it via:
pip install beautifulsoup4
and change your import to:
from bs4 import BeautifulSoup
Also:
Traceback (most recent call last): File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 13, in if link.has_attr('href'): TypeError: 'NoneType' object is not callable
Here link
is a Tag
instance which does not have an has_attr
method. This means that, remembering what a dot notation means in BeautifulSoup, it would try to search for element has_attr
inside the link
element which results into nothing found. In other words, link.has_attr
is None
and obviously None('href')
results into an error.
Instead, do:
soup = BeautifulSoup(response, parse_only=SoupStrainer('a', href=True))
for link in soup.find_all("a", href=True):
print(link['href'])
FYI, here is a complete working code that I used to debug your problem (using requests
):
import requests
from bs4 import BeautifulSoup, SoupStrainer
response = requests.get('http://www.nytimes.com').content
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a', href=True)).find_all("a", href=True):
print(link['href'])
来源:https://stackoverflow.com/questions/35733853/beautifulsoup-not-working-getting-nonetype-error