R download.file with “wget”-method and specifying extra wget options

与世无争的帅哥 提交于 2021-02-08 07:52:38

问题


I have a probably rather basic question to using the download.file function in R using the wget option and employing some of the wget extra options, but I just cannot get it to work.

What I want to do: download a local copy of a webpage (actually several webpages, but for now the challenge is to get it to work even with 1).

Challenge: I need the local copy to look exactly like the online version, which also means to include links/ icons, etc.. I found wget to be a good tool for this and I would like to specify some of the extra options, such as --random wait, -p, -r. I found some very helpful tutorials on this, however none of them employed the extra options in R, but rather in wget directly.

So here is the code I have put together for this:

download.file('https://www.wikipedia.org/', destfile = "wikipage", method = "wget", extra = getOption("--random wait", "-r", "-p"))

which does not work. I suspect there are problems with both, the "wget" method and the specification of the extras.

Can anyone help, it would be much appreciated?

A bonus question: I know that the destfile is supposed to specify a file name for the downloaded document, but is there any way I could specify a folder through a path to which all downloaded files should be saved?

Thank you in advance!

Best Carolin


回答1:


You can specify multiple options directly in the extra argument, without getOption().

Further, you can simply include the path to the file where you want to save your downloaded file in the destfile.

download.file('https://www.wikipedia.org/', destfile = "mydirectory/wikipage.html", method = "wget", extra = "-r -p --random-wait")

You will, however, have the problem that it will attempt to save all downloaded items into the same destfile.

Note that there was a similar question a while ago (I saw that only now). The suggested solution was to use system() instead of download.file to run the wget command. Adapted to your problem:

setwd("./mydirectory")
system("wget http://www.wikipedia.org -p -k --random-wait")

Edit: Please also note that both commands will only work on systems with wget installed. On Linux/BSD/Mac, the package to install should usually be called wget. On Windows, wget is (according to the download.file() help) available from packages like gnuwin32 and Cygwin. In this case, the system() command may still not work if the system does not know where the wget executable is. You may, in this case, need to specify the absolute path to the wget executable.



来源:https://stackoverflow.com/questions/50293314/r-download-file-with-wget-method-and-specifying-extra-wget-options

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!