Download file from Blob URL with Python

后端 未结 2 1311
死守一世寂寞
死守一世寂寞 2021-01-03 17:33

I wish to have my Python script download the Master data (Download, XLSX) Excel file from this Frankfurt stock exchange webpage.

When to retrieve it with

相关标签:
2条回答
  • 2021-01-03 17:49
    from bs4 import BeautifulSoup
    import requests
    import re
    
    url='http://www.xetra.com/xetra-en/instruments/etf-exchange-traded-funds/list-of-tradable-etfs'
    html=requests.get(url)
    page=BeautifulSoup(html.content)
    reg=re.compile('Master data')
    find=page.find('span',text=reg)  #find the file url
    file_url='http://www.xetra.com'+find.parent['href']
    file=requests.get(file_url)
    with open(r'C:\\Users\user\Downloads\file.xlsx','wb') as ff:
        ff.write(file.content)
    

    recommend requests and BeautifulSoup,both good lib

    0 讨论(0)
  • 2021-01-03 17:54

    That 289 byte long thing might be a HTML code for 403 forbidden page. This happen because the server is smart and rejects if your code does not specify a user agent.

    Python 3

    # python3
    import urllib.request as request
    
    url = 'http://www.xetra.com/blob/1193366/b2f210876702b8e08e40b8ecb769a02e/data/All-tradable-ETFs-ETCs-and-ETNs.xlsx'
    # fake user agent of Safari
    fake_useragent = 'Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25'
    r = request.Request(url, headers={'User-Agent': fake_useragent})
    f = request.urlopen(r)
    
    # print or write
    print(f.read())
    

    Python 2

    # python2
    import urllib2
    
    url = 'http://www.xetra.com/blob/1193366/b2f210876702b8e08e40b8ecb769a02e/data/All-tradable-ETFs-ETCs-and-ETNs.xlsx'
    # fake user agent of safari
    fake_useragent = 'Mozilla/5.0 (iPad; CPU OS 6_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/6.0 Mobile/10A5355d Safari/8536.25'
    
    r = urllib2.Request(url, headers={'User-Agent': fake_useragent})
    f = urllib2.urlopen(r)
    
    print(f.read())
    
    0 讨论(0)
提交回复
热议问题