Download a csv from url and make it a dataframe python pandas

本秂侑毒 提交于 2020-06-16 17:31:31

问题


I am new to python so need a little help here. I have a dataframe with a url column with a link that allows me to download a CSV for each link. My aim is to create a loop/ whatever works so that I can run one command that will allow me to download,read the csv and create a dataframe for each of the rows. Any help would be appreciated. I have attached part of the dataframe below. If the link doesn't work (it probably won't you can just replace it with a link from 'https://finance.yahoo.com/quote/GOOG/history?p=GOOG' (any other company too) and navigate to download csv and use that link.

Dataframe:

Symbol         Link
YI             https://query1.finance.yahoo.com/v7/finance/download/YI?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E
PIH            https://query1.finance.yahoo.com/v7/finance/download/PIH?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E
TURN           https://query1.finance.yahoo.com/v7/finance/download/TURN?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E
FLWS           https://query1.finance.yahoo.com/v7/finance/download/FLWS?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E

Thanks again.


回答1:


You need a post request and the get the contents to io.

import pandas as pd
import requests
import io

url = 'https://query1.finance.yahoo.com/v7/finance/download/GOOG'
params ={'period1':1538761929,
         'period2':1541443929,
         'interval':'1d',
         'events':'history',
         'crumb':'v4z6ZpmoP98',
        }

r = requests.post(url,data=params)
if r.ok:
    data = r.content.decode('utf8')
    df = pd.read_csv(io.StringIO(data))

To get the params, I just followed the liked and copied everything after ‘?’. Check that they match ;)

Results:

Update:

If you can see the raw csv contents directly in url, just pass the url in pd.read_csv Example data directly from url:

data_url ='https://raw.githubusercontent.com/pandas-dev/pandas/master/pandas/tests/data/iris.csv'

df = pd.read_csv(data_url)



回答2:


I routinely use this procedure

import pandas as pd
import requests

url="<URL TO DOWNLOAD.CSV>"
s=requests.get(url).content
c=pd.read_csv(s)



回答3:


First break down the task to smaller parts, what you need to do is:

  1. Iterate over the DataFrame with the links.

    for index, row in df.iterrows():
        url= row["Link"]
    
  2. Download JSON file from Yahoo Finance using Python's requests library. This is probably the difficult part, you will need to get cookies before actually downloading the CSV file, more info here,here and here. Once you create the proper URL with the cookie, you can download it with:

    re = requests.get(URL)
    print(re.status_code) #status code 200 for successful download
    
  3. Optionally, you can save the response to your local disk.
  4. Load it with pandas.

    df = pd.read_csv(file_name) #in case of saving file to disk
    df = pd.read_csv(re.content) #directly from the response
    



回答4:


If you apply the following to the dataframe it will place each of the documents in an np.array. Not in a dataframe( I'm unsure of how to get there). But this will give you access to all the files and its only a matter of putting them in a df.

links = test['Link'].unique()

import requests
a=[]
for x in links:
     url=x
     s=requests.get(url).content
     a.append(s)

a[4] or np.array(a[4]).tolist() outputs the entire file just in the incorrect format.

Use 'https://api.iextrading.com/1.0/stock/GOOG/chart/5y?format=csv' rather than Yahoo it is much more accessible.



来源:https://stackoverflow.com/questions/53158452/download-a-csv-from-url-and-make-it-a-dataframe-python-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!