R remove duplicate elements in character vector, not duplicate rows

狂风中的少年 提交于 2019-12-05 04:32:27

Try this:

within(dates, Dates <- lapply(Dates, unique))

I solved the issue I was having of removing duplicate values from a character vector - wrap a lapply(strapply(), unique):

df1$date <- as.character(lapply((strapply(df1[[2]], "((\\D\\d{1,2}(/|-)\\d{1,2}(/|-    )\\d{2,4})|(\\s\\d{1,2}(/|-)\\d{2,4})|((JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV){1}[\\s|-]{0,2}\\d{1,4}(\\D[\\s|-]{0,}\\d{2,4}){0,}))")),unique))

Thanks for all your help.

I would gsub out c( and ) in dates and then for each row I would call unique on a strsplit of it using the ,

UNTESTED but maybe something like: sapply(dates$dates, function(x){ new.x=gsub("c(|)","",x) new.x=strsplit(new.x, ",") unique(new.x) })

You might be looking for something like this.

 df

     Doc                                       Dates
 1 12345                c("06/01/2000","08/09/2002")
 2 23456 c("07/01/2000", "09/08/2003", "07/01/2000")
 3 34567 c("09/06/2004", "09/06/2004", "12/30/2006")
 4 45678                c("06/01/2000","08/09/2002")

 Eval and Parse
 x <- t(sapply(df[,"Dates"],function(x){unique(eval(parse(text = x)))}))
 df$Dates <- paste(x[,1],x[,2],sep=",")

 df
      Doc                 Dates
  1 12345 06/01/2000,08/09/2002
  2 23456 07/01/2000,09/08/2003
  3 34567 09/06/2004,12/30/2006
  4 45678 06/01/2000,08/09/2002


 Same can be achieved using Regex:

 paste(unique(unlist(strsplit(gsub("c\\(|\\)","",'c("24/07/2012","22/01/2012","24/07/2012")'),","))),sep = "")

 [1] "\"24/07/2012\"" "\"22/01/2012\""

 Haven't tried on data but works
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!