问题
I would like to read data from nse-india.com to R using download.file() as shown below.
url = 'http://www.nseindia.com/content/historical/EQUITIES/2014/SEP/cm24SEP2014bhav.csv.zip'
temp = tempfile()
download.file(url, destfile = temp,method = 'wget')
It throws up following error:
SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
syswgetrc = C:\PROGRA~2\GnuWin32/etc/wgetrc
--2014-09-28 21:19:26-- http://www.nseindia.com/content/historical/EQUITIES/2014/SEP/cm24SEP2014bhav.csv.zip
Resolving www.nseindia.com... 202.83.22.200, 202.83.22.203
Connecting to www.nseindia.com|202.83.22.200|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2014-09-28 21:19:26 ERROR 403: Forbidden.
Warning messages:
1: running command 'wget "http://www.nseindia.com/content/historical/EQUITIES/2014/SEP/cm24SEP2014bhav.csv.zip" -O "C:\Users\ITITHI~1\AppData\Local\Temp\Rtmp2fjADx\file1fb02375882"' had status 1
2: In download.file(url, destfile = temp, method = "wget") :
download had nonzero exit status
Please let me know anyway to fix this.
EDIT: Or any other method to download the file from within R would also be great.
回答1:
You need to set a browser-like user agent string so the site thinks you're a browser vs an automated scraper/downloader robot:
library(httr) # >=v0.5
GET("http://www.nseindia.com/content/historical/EQUITIES/2014/SEP/cm24SEP2014bhav.csv.zip",
user_agent("Mozilla/5.0"), write_disk("cm24SEP2014bhav.csv.zip"))
## Response [http://www.nseindia.com/content/historical/EQUITIES/2014/SEP/cm24SEP2014bhav.csv.zip]
## Date: 2014-09-28 23:53
## Status: 200
## Content-type: application/zip
## Size: 58.2 kB
## <ON DISK> cm24SEP2014bhav.csv.zip
回答2:
You need permission to access that site. Here is the message (in doc) from the httr
package:
url = 'http://www.nseindia.com/content/historical/EQUITIES/2014/SEP/cm24SEP2014bhav.csv.zip'
doc <- content(GET(url))
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head><title>Access Denied</title></head>
<body>
<h1>Access Denied</h1>
You don't have permission to access "http://www.nseindia.com/content/historical/EQUITIES/2014/SEP/cm24SEP2014bhav.csv.zip" on this server.<p>
Reference #18.df24317.1411924047.3b4f02a1
</p>
</body>
</html>
来源:https://stackoverflow.com/questions/26086868/error-downloading-a-csv-in-zip-from-website-with-get-in-r