问题
I am new to python so need a little help here. I have a dataframe with a url column with a link that allows me to download a CSV for each link. My aim is to create a loop/ whatever works so that I can run one command that will allow me to download,read the csv and create a dataframe for each of the rows. Any help would be appreciated. I have attached part of the dataframe below. If the link doesn't work (it probably won't you can just replace it with a link from 'https://finance.yahoo.com/quote/GOOG/history?p=GOOG' (any other company too) and navigate to download csv and use that link.
Dataframe:
Symbol Link
YI https://query1.finance.yahoo.com/v7/finance/download/YI?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E
PIH https://query1.finance.yahoo.com/v7/finance/download/PIH?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E
TURN https://query1.finance.yahoo.com/v7/finance/download/TURN?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E
FLWS https://query1.finance.yahoo.com/v7/finance/download/FLWS?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E
Thanks again.
回答1:
You need a post request and the get the contents to io.
import pandas as pd
import requests
import io
url = 'https://query1.finance.yahoo.com/v7/finance/download/GOOG'
params ={'period1':1538761929,
'period2':1541443929,
'interval':'1d',
'events':'history',
'crumb':'v4z6ZpmoP98',
}
r = requests.post(url,data=params)
if r.ok:
data = r.content.decode('utf8')
df = pd.read_csv(io.StringIO(data))
To get the params, I just followed the liked and copied everything after ‘?’. Check that they match ;)
Results:
Update:
If you can see the raw csv contents directly in url, just pass the url in pd.read_csv
Example data directly from url:
data_url ='https://raw.githubusercontent.com/pandas-dev/pandas/master/pandas/tests/data/iris.csv'
df = pd.read_csv(data_url)
回答2:
I routinely use this procedure
import pandas as pd
import requests
url="<URL TO DOWNLOAD.CSV>"
s=requests.get(url).content
c=pd.read_csv(s)
回答3:
First break down the task to smaller parts, what you need to do is:
Iterate over the DataFrame with the links.
for index, row in df.iterrows(): url= row["Link"]Download JSON file from Yahoo Finance using Python's
requestslibrary. This is probably the difficult part, you will need to get cookies before actually downloading the CSV file, more info here,here and here. Once you create the proper URL with the cookie, you can download it with:re = requests.get(URL) print(re.status_code) #status code 200 for successful download- Optionally, you can save the response to your local disk.
Load it with pandas.
df = pd.read_csv(file_name) #in case of saving file to disk df = pd.read_csv(re.content) #directly from the response
回答4:
If you apply the following to the dataframe it will place each of the documents in an np.array. Not in a dataframe( I'm unsure of how to get there). But this will give you access to all the files and its only a matter of putting them in a df.
links = test['Link'].unique()
import requests
a=[]
for x in links:
url=x
s=requests.get(url).content
a.append(s)
a[4] or np.array(a[4]).tolist() outputs the entire file just in the incorrect format.
Use 'https://api.iextrading.com/1.0/stock/GOOG/chart/5y?format=csv' rather than Yahoo it is much more accessible.
来源:https://stackoverflow.com/questions/53158452/download-a-csv-from-url-and-make-it-a-dataframe-python-pandas