(1) I have a large table read in R with more than a 10000 of rows and 10 columns.
(2) The 3rd column of the table contain the name of the hospitals. Some of them are
Using dplyr
Setting up Data --- using @Chase's sample data.
#Sample data
df <- data.frame(patients = 1:5, treatment = letters[1:5],
hospital = c("yyy", "yyy", "zzz", "www", "uuu"), response = rnorm(5))
#List of hospitals we want to do further analysis on
goodHosp <- c("yyy", "uuu")
Now filter data using dplyr
filter
library(dplyr)
df %>% filter(hospital %in% goodHosp)
Use the %in%
operator.
#Sample data
dat <- data.frame(patients = 1:5, treatment = letters[1:5],
hospital = c("yyy", "yyy", "zzz", "www", "uuu"), response = rnorm(5))
#List of hospitals we want to do further analysis on
goodHosp <- c("yyy", "uuu")
You can either index directly into your data.frame object:
dat[dat$hospital %in% goodHosp ,]
or use the subset command:
subset(dat, hospital %in% goodHosp)