Scrape Multiple URLs using Beautiful Soup

前端 未结 2 1521
南笙
南笙 2021-02-02 02:56

I\'m trying to extract specific classes from multiple URLs. The tags and classes stay the same but I need my python program to scrape all as I just input my link.

Here\'

2条回答
  •  庸人自扰
    2021-02-02 03:18

    If you want to scrape links in batches. Specify a batch size and iterate over it.

    from bs4 import BeautifulSoup
    import requests
    import pprint
    import re
    import pyperclip
    
    batch_size = 5
    urllist = ["url1", "url2", "url3", .....]
    url_chunks = [urllist[x:x+batch_size] for x in xrange(0, len(urllist), batch_size)]
    
    def scrape_url(url):
        response = requests.get(url)
        soup = BeautifulSoup(response.content, "html.parser")
        h1 = soup.find("h1", class_= "class-headline")
        return (h1.get_text())
    
    def scrape_batch(url_chunk):
        chunk_resp = []
        for url in url_chunk:
            chunk_resp.append(scrape_url(url))
        return chunk_resp
    
    for url_chunk in url_chunks:
        print scrape_batch(url_chunk)
    

提交回复
热议问题