Filtering data in a dataframe based on criteria

╄→гoц情女王★ 提交于 2019-12-03 13:47:57

Given a dataframe "dfrm" with the names of the cities in the 'city' column, the population in the "population" column and the average summer temperature in the "meanSummerT" column your request for the subset meeting those joint requirements would be met with any of these:

subset( dfrm, population < 1e6 & meanSummerT > 70)
dfrm[ which(dfrm$population < 1e6 & dfrm$meanSummerT > 70) , ]
dfrm[ which(dfrm[['population']] < 1e6 & dfrm[['meanSummerT']] > 70) , ]

If you wanted just the names of the cities meeting those joint criteria then these would work:

subset( dfrm, population < 1e6 & meanSummerT > 70 , city)
dfrm[ which(dfrm$population < 1e6 & dfrm$meanSummerT > 70) , "city" ]
dfrm[ which(dfrm[['population']] < 1e6 & dfrm[['meanSummerT']] > 70) , "city" ]

Note that the column names are not quoted in the subset or following the "$" operator but they are quoted inside "[["

mnel

You are looking for subset

if your data is called mydata

newdata <- subset(mydata, city < 1e6)

Or you could use [, which is programatically safer

newdata <- mydata[mydata$city < 1e6]

For more than one condition use & or | where approriate

You could also use the sqldf package to use sql

library(sqldf)

newdata <-  sqldf('select * from mydata where city > 1e6')

Or you could use data.table which makes the syntax easier for [ (as well as being memory efficient)

library(data.table)

mydatatable <- data.table(mydata)
newdata <- mydatatable[city > 1e6]
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!