Apply function that downloads zip files and deletes specific files

烂漫一生 提交于 2020-01-04 05:41:14

问题


I am trying to write a function and call it using apply to each row in my dataset. The dataset contains URLs of zip files, which will be downloaded, unzipped, and after unzipping TXT and zip files will be deleted from the working directory.

head(data)
                                                 data                                                                   URL
1 /files/market_valuation/ru/2017/val170502170509.zip http://www.kase.kz/files/market_valuation/ru/2017/val170502170509.zip
2 /files/market_valuation/ru/2017/val170424170430.zip http://www.kase.kz/files/market_valuation/ru/2017/val170424170430.zip
3 /files/market_valuation/ru/2017/val170417170423.zip http://www.kase.kz/files/market_valuation/ru/2017/val170417170423.zip
4 /files/market_valuation/ru/2017/val170410170416.zip http://www.kase.kz/files/market_valuation/ru/2017/val170410170416.zip
5 /files/market_valuation/ru/2017/val170403170409.zip http://www.kase.kz/files/market_valuation/ru/2017/val170403170409.zip
6 /files/market_valuation/ru/2017/val170327170402.zip http://www.kase.kz/files/market_valuation/ru/2017/val170327170402.zip

My function:

Price_KASE <- function(data){
    URL = data[,2]
    dir = basename(URL)
    download.file(URL, dir)
    unzip(dir)
    TXT <- list.files(pattern = "*.TXT")
    zip <- list.files(pattern = "*.zip")
    file.remove(TXT, zip)
}

    apply(data, 1, Price_KASE(data))

And the error message:

Error in download.file(URL, dir) : 
  'url' must be a length-one character vector

Please explain what is wrong with my code and how do I fix it? Thank you.

Alternative way using for loop:

for (i in 1:length(data[,2])){
    URL = data[i, 2]
    dir = basename(URL)
    download.file(URL, dir)
    unzip(dir)
    TXT <- list.files(pattern = "*.TXT")
    zip <- list.files(pattern = "*.zip")
    file.remove(TXT, zip)
}

It seems to work OK, but after 4th or 5th file I get In download.file(URL, dir) : cannot open URL 'http://www.kase.kz/files/market_valuation/ru/2017/val170410170416.zip': HTTP status was '503 Service Temporarily Unavailable'


回答1:


I think that in your data frame your URLs are stored as factor variables. try using:

data[,2] <- as.character(data[,2])

if you are reading this as .csv or constructing the data frame, consider setting stringsAsFactors = FALSE.

UPDATE:

I noticed something when you try to use 1 in apply, it takes all of the lines a single vector. So you also have to change your function. Please see bold section below. This code runs completely in the example below giving the output.

data1 <- data.frame(a = "/files/market_valuation/ru/2017/val170502170509.zip",
                b = "http://www.kase.kz/files/market_valuation/ru/2017/val170502170509.zip")


Price_KASE <- function(data){
  **URL = data[2]**
  dir = basename(URL)
  download.file(URL, dir)
  unzip(dir)
  TXT <- list.files(pattern = "*.TXT")
  zip <- list.files(pattern = "*.zip")
  file.remove(TXT, zip)
}

data1$b <- as.character(data1$b)

apply(data1, 1, Price_KASE)

#     [,1]
#[1,] TRUE
#[2,] TRUE


来源:https://stackoverflow.com/questions/43774949/apply-function-that-downloads-zip-files-and-deletes-specific-files

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!