Searching for unique values in dataframe and creating a table with them

回眸只為那壹抹淺笑 提交于 2019-11-29 15:11:45

Nothing is impossible.

x <- read.csv(text="
User, Source
1, http://www.googleclick.com?utm_source=ADX&ID56789
2, http://www.googleclick.com?utm_source=ADW&ID56009
3, http://www.googleclick.com?utm_source=ADWords&ID53389
", header=TRUE, stringsAsFactors=FALSE)

First, use strsplit

strsplit(x$Source, split="\\?utm_source=")
[[1]]
[1] " http://www.googleclick.com" "ADX&ID56789"                

[[2]]
[1] " http://www.googleclick.com" "ADW&ID56009"                

[[3]]
[1] " http://www.googleclick.com" "ADWords&ID53389" 

Then find a red-hot poker and stick in the eye of your so-called advisor.


EDIT:

As suggested by Paul Hiemstra, you can also use a regular expression directly:

gsub(".*\\?utm_source=", "", x$Source)
[1] "ADX&ID56789"     "ADW&ID56009"     "ADWords&ID53389"

@Andrie's answer does the trick. Here's another way using using regmatches and gregexpr that might be useful.

d <- read.table(text="User URL
1 http://www.googleclick.com?utm_source=ADX&ID56789
2 http://www.googleclick.com?utm_source=ADW&ID56009
3 http://www.googleclick.com?utm_source=ADWords&ID53389", header=TRUE)

domain.pat <- '((?<=www.)([[:alnum:]_]+))'
source.pat <- '((?<=utm_source=)([[:alnum:]&]+))' # exclude the '&' here to only grab up to the '&'
all.matches <- gregexpr(paste(domain.pat, source.pat, sep='|'), d$URL, perl=TRUE)
all.substrings <- regmatches(d$URL, all.matches)
do.call(rbind, all.substrings)

#      [,1]          [,2]             
# [1,] "googleclick" "ADX&ID56789"    
# [2,] "googleclick" "ADW&ID56009"    
# [3,] "googleclick" "ADWords&ID53389"
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!