I am working on building a link checking script to be used in monitoring a domain I manage. I am getting an error about the 9th url is ran through the findLinks() function. I
This is working for me and I am passed the error now:
for link in soup.find_all('a'):
if link.get('href'):
# all absolute paths hrefs and add to array
if "google.com" in link.get('href'):
Similar to testing for is not None although that did not work. Thanks all
You cannot iterate over (use the in keyword to check the contents of) None, which is the default returned from get() when it fails to find the provided name, so using an empty list as the default (second argument) will prevent the error:
for link in soup.find_all('a'):
# all absolute paths hrefs and add to array
if "google.com" in link.get('href', []):
linksToCrawl.append(link.get('href'))
You still may wish to confirm that you need link.get('href') to return something truthy before getting this far into the function.