checking validity of a list og urls using GET

不打扰是莪最后的温柔 提交于 2019-12-25 04:59:11

问题


i have a .csv file of URLS that i need to validate.

i want to apply GET of httr to every row of the data frame.

 > websites
          website
1   www.msn.com
2   www.wazl.com
3  www.amazon.com
4 www.rifapro.com

I did find similar questions and tried to apply the provided answers; however not working.

> apply(websites, 1, transform, result=GET(websites$website))


  Error: length(url) == 1 is not TRUE


> apply(websites, websites[,1], GET())
Error in handle_url(handle, url, ...) : 
  Must specify at least one of url or handle

i am not sure what i am doing wrong.


回答1:


You could do something like

websites <- read.table(header=T, text="website
1   www.msn.com
2   www.wazl.com
3  www.amazon.com
4 www.rifapro.com")
library(httr)
urls <- paste0(ifelse(grepl("^https?://", websites$website, ig=T), "", "http://"),
          websites$website)
lst <- lapply(unique(tolower(urls)), function(url) try(HEAD(url), silent = T))
names(lst) <- urls
sapply(lst, function(x) if (inherits(x, "try-error")) -999 else status_code(x))
# http://www.msn.com    http://www.wazl.com  http://www.amazon.com http://www.rifapro.com 
#                200                   -999                    405                   -999 

No need for a GET request imho.




回答2:


@LukeA gave me the answer and i just altered it to the below to generate a dataframe rather than a list. Thank you LukeA

urls <- paste0(ifelse(grepl("^https?://", websitm$WEBSITE, ig=T), "", "http://"),
    websitm$WEBSITE )
    lst <- lapply(unique(tolower(urls)), function(url) try(HEAD(url), silent = T))
    a<- list(lst,urls)
b<- as.data.frame(sapply(a, rbind))
b$outcome<- sapply(b$V1, function(x) if (inherits(x, "try-error")) -999 else status_code(x))

After refining the above code:

website<- read.csv(file= "path")
website<- website[!duplicated(website$Website),]
websitm<- website
websitm$Website <- paste0(ifelse(grepl("^(https?://)?www.",websitm[, 2], ig=T), "", "http://www."),websitm[, 2])
websitm$Website <- paste0(ifelse(grepl("^https?://",websitm[, 2], ig=T), "", "http://"),websitm[, 2])

Httpcode<- function(x){try(HEAD(x, timeout(seconds = 20), silent = T))}
websitm$error<- apply(websitm[,2, drop=F], 1, Httpcode)
websitm$outcome<- sapply(websitm$error, function(x) if (inherits(x, "try-error")) -999 else status_code(x))
websitm<- data.frame(lapply(websitm, as.character), stringsAsFactors=FALSE)


来源:https://stackoverflow.com/questions/43915316/checking-validity-of-a-list-og-urls-using-get

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!