Download all csv files from URL

久未见 提交于 2020-01-05 03:50:12

问题


I want to download all csv files, any idea how I do this?

from bs4 import BeautifulSoup
import requests
url = requests.get('http://www.football-data.co.uk/englandm.php').text
soup = BeautifulSoup(url)
for link in soup.findAll("a"):
    print link.get("href")

回答1:


You just need to filter the hrefs which you can do with a css selector,a[href$=.csv] which will find the href's ending in .csv then join each to the base url, request and finally write the content:

from bs4 import BeautifulSoup
import requests
from urlparse import urljoin
from os.path import basename

base = "http://www.football-data.co.uk/"
url = requests.get('http://www.football-data.co.uk/englandm.php').text
soup = BeautifulSoup(url)
for link in (urljoin(base, a["href"]) for a in soup.select("a[href$=.csv]")):
    with open(basename(link), "w") as f:
        f.writelines(requests.get(link))

Which will give you five files, E0.csv, E1.csv, E2.csv, E3.csv, E4.csv with all the data inside.




回答2:


Something like this should work:

from bs4 import BeautifulSoup
from time import sleep
import requests


if __name__ == '__main__':
    url = requests.get('http://www.football-data.co.uk/englandm.php').text
    soup = BeautifulSoup(url)
    for link in soup.findAll("a"):
        current_link = link.get("href")
        if current_link.endswith('csv'):
            print('Found CSV: ' + current_link)
            print('Downloading %s' % current_link)
            sleep(10)
            response = requests.get('http://www.football-data.co.uk/%s' % current_link, stream=True)
            fn = current_link.split('/')[0] + '_' + current_link.split('/')[1] + '_' + current_link.split('/')[2]
            with open(fn, "wb") as handle:
                for data in response.iter_content():
                    handle.write(data)


来源:https://stackoverflow.com/questions/39033674/download-all-csv-files-from-url

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!