Does R have any package for parsing out the parts of a URL?

后端 未结 6 757
无人及你
无人及你 2020-12-30 06:32

I have a list of urls that I would like to parse and normalize.

I\'d like to be able to split each address into parts so that I can identify \"www.google.com/test/in

6条回答
  •  余生分开走
    2020-12-30 06:32

    If you like tldextract one option would be to use the version on appengine

    require(RJSONIO)
    test <- c("test.server.com/test", "www.google.com/test/index.asp", "http://test.com/?ex")
    lapply(paste0("http://tldextract.appspot.com/api/extract?url=", test), fromJSON)
    [[1]]
       domain subdomain       tld 
     "server"    "test"     "com" 
    
    [[2]]
       domain subdomain       tld 
     "google"     "www"     "com" 
    
    [[3]]
       domain subdomain       tld 
       "test"        ""     "com" 
    

提交回复
热议问题