Subsetting in R using OR condition with strings

巧了我就是萌 提交于 2019-12-04 08:33:20

问题


I have a data frame with about 40 columns, the second column, data[2] contains the name of the company that the rest of the row data describes. However, the names of the companies are different depending on the year (trailing 09 for 2009 data, nothing for 2010).

I would like to be able to subset the data such that I can pull in both years at once. Here is an example of what I'm trying to do...

subset(data, data[2] == "Company Name 09" | "Company Name", drop = T) 

Essentially, I'm having difficulty using the OR operator within the subset function.

However, I have tried other alternatives:

subset(data, data[[2]] == grep("Company Name", data[[2]]))

Perhaps there's an easier way to do it using a string function?

Any thoughts would be appreicated.


回答1:


First of all (as Jonathan done in his comment) to reference second column you should use either data[[2]] or data[,2]. But if you are using subset you could use column name: subset(data, CompanyName == ...).

And for you question I will do one of:

subset(data, data[[2]] %in% c("Company Name 09", "Company Name"), drop = TRUE) 
subset(data, grepl("^Company Name", data[[2]]), drop = TRUE)

In second I use grepl (introduced with R version 2.9) which return logical vector with TRUE for match.




回答2:


A couple of things:

1) Mock-up data is useful as we don't know exactly what you're faced with. Please supply data if possible. Maybe I misunderstood in what follows?

2) Don't use [[2]] to index your data.frame, I think [,"colname"] is much clearer

3) If the only difference is a trailing ' 09' in the name, then simply regexp that out:

R> x1 <- c("foo 09", "bar", "bar 09", "foo")
R> x2 <- gsub(" 09$", "", x1)
[1] "foo" "bar" "bar" "foo"
R> 

Now you should be able to do your subset on the on-the-fly transformed data:

R> data <- data.frame(value=1:4, name=x1)
R> subset(data, gsub(" 09$", "", name)=="foo")
  value   name
1     1 foo 09
4     4    foo
R> 

You could also have replace the name column with regexp'ed value.



来源:https://stackoverflow.com/questions/2125231/subsetting-in-r-using-or-condition-with-strings

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!