How to select R data.table rows based on substring match (a la SQL like)

牧云@^-^@ 提交于 2019-11-26 18:55:13

问题


I have a data.table with a character column, and want to select only those rows that contain a substring in it. Equivalent to SQL WHERE x LIKE '%substring%'

E.g.

> Months = data.table(Name = month.name, Number = 1:12)
> Months["mb" %in% Name]
Empty data.table (0 rows) of 2 cols: Name,Number

How would I select only the rows where Name contains "mb"?


回答1:


data.table has a like function.

Months[like(Name,"mb")]
        Name Number
1: September      9
2:  November     11
3:  December     12

Or, %like% looks nicer :

> Months[Name %like% "mb"]
    Name Number
1: September      9
2:  November     11
3:  December     12

Note that %like% and like() use grepl (returns logical vector) rather than grep (returns integer locations). That's so it can be combined with other logical conditions :

> Months[Number<12 & Name %like% "mb"]
        Name Number
1: September      9
2:  November     11

and you get the power of regular expression search (not just % or * wildcard), too.




回答2:


The operator %in% does not do partial string matching it is used for finding if values exist in another set of values i.e. "a" %in% c("a","b","c")

To do partial string matching you need to use the grep() function. You can use the grep to return an index of all columns with "mb" in it. Then subset the rows by that index

Months[grep("mb", Name)]    # data.table syntax slightly easier


来源:https://stackoverflow.com/questions/14630335/how-to-select-r-data-table-rows-based-on-substring-match-a-la-sql-like

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!