I have a list of urls that I would like to parse and normalize.
I\'d like to be able to split each address into parts so that I can identify \"www.google.com/test/in
If you like tldextract one option would be to use the version on appengine
require(RJSONIO)
test <- c("test.server.com/test", "www.google.com/test/index.asp", "http://test.com/?ex")
lapply(paste0("http://tldextract.appspot.com/api/extract?url=", test), fromJSON)
[[1]]
domain subdomain tld
"server" "test" "com"
[[2]]
domain subdomain tld
"google" "www" "com"
[[3]]
domain subdomain tld
"test" "" "com"