Python, Mechanize - request disallowed by robots.txt even after set_handle_robots and add_headers
I have made a web crawler which gets all links till the 1st level of page and from them it gets all link and text plus imagelinks and alt. here is whole code: import urllib import re import time from threading import Thread import MySQLdb import mechanize import readability from bs4 import BeautifulSoup from readability.readability import Document import urlparse url = ["http://sparkbrowser.com"] i=0 while i<len(url): counterArray = [0] levelLinks = [] linkText = ["homepage"] levelLinks = [] def scraper(root,steps): urls = [root] visited = [root] counter = 0 while counter < steps: step_url =