I have the following code for a web crawler in Python 3:
import requests
from bs4 import BeautifulSoup
import re
def get_links(link):
return_links = []
There is an entire section of scrapy guide dedicated to broad crawls. I suggest you to fine-grain your settings for doing this succesfully.
For recreating the behaviour you need in scrapy, you must
An untested example (that can be, of course, refined):
class AllSpider(scrapy.Spider):
name = 'all'
start_urls = ['https://yourgithub.com']
def __init__(self):
self.links=[]
def parse(self, response):
self.links.append(response.url)
for href in response.css('a::attr(href)'):
yield response.follow(href, self.parse)