subset

r subset rows by criteria and by factor group

强颜欢笑 提交于 2019-12-07 08:57:07
问题 I have this data.frame with a lot of NAs: df <- data.frame(a = rep(letters[1:3], each = 3), b = c(NA, NA, NA, 1, NA, 3, NA, NA, 7)) df > df a b 1 a NA 2 a NA 3 a NA 4 b 1 5 b NA 6 b 3 7 c NA 8 c NA 9 c 7 I would like to subset this dataframe to obtain only factor group rows that have no less than two values, such as this: a b 1 b 1 2 b NA 3 b 3 I have tried this function but it doesn't work: subset(df, sum(!is.na(b)) < 1, by = a) > [1] a b <0 rows> (or 0-length row.names) Any suggestion?

Subset a dataframe based on a single condition applied to multiple columns

痞子三分冷 提交于 2019-12-07 08:38:59
问题 I've had a look through the existing subset Q&A's on this site and couldn't quite find what I was looking for. I want to subset a data frame based on one condition (e.g. if the value is below 5). However, I only want the rows where the value in all of the columns is below 5. For example using the iris dataset - I would like to select all the rows where columns 1-3 all have values below 5. subdata <- iris[which(iris[,1:3]<5),] This doesn't do it for me. I get lots of NA rows at the bottom of

subsetting dataframe in R using two criteria, one of them is regular expression

心不动则不痛 提交于 2019-12-07 08:25:17
问题 I have a dataset something like this: col_a col_b col_c 1 abc_boy 1 2 abc_boy 2 1 abc_girl 1 2 abc_girl 2 I need to pick up the first row only based on col_b and col_c , and then change the valye in col_c , which is something like this: df[grep("_boy$",df[,"col_b"]) & df[,"col_c"]=="1","col_c"] <- "yes" But the code above is not OK, since the first criteria and the second criteria do not originate from the same set. I can do it in a dumb way by using a explicit loop, or do a "two-tier"

Subsetting one matrix based in another matrix

半腔热情 提交于 2019-12-07 08:16:53
问题 I would like to select the R based on G strings to obtain separated outputs with equal dimensions. This are my inputs: R <- 'pr_id sample1 sample2 sample3 AX-1 100 120 130 AX-2 150 180 160 AX-3 160 120 196' R <- read.table(text=R, header=T) G <- 'pr_id sample1 sample2 sample3 AX-1 AB AA AA AX-2 BB AB NA AX-3 BB AB AA' G <- read.table(text=G, header=T) This are my expected outputs: AA <- 'pr_id sample1 sample2 sample3 AX-1 NA 120 130 AX-2 NA NA NA AX-3 NA NA 196' AA <- read.table(text=AA,

R error promise already under evaluation when using subset in function but no error in script

主宰稳场 提交于 2019-12-07 07:47:27
I'm getting a strange error when I run the following function: TypeIDs=c(18283,18284,17119,17121,17123,17125,17127,17129,17131,17133,18367,18369,18371,18373,18375,18377,18379) featsave<-function(featfile,TypeIDs=TypeIDs) { mydata1<-read.table(featfile,header=TRUE) mydata2<-subset(mydata1,TypeID %in% TypeIDs) mydata<-as.data.frame(cast(mydata2, Feat1 + Feat2 + ID ~ TypeID,value="value")) save(mydata,file="mydatafile.Rdata",compress=TRUE) return(mydata) } with the following data: Feat1 Feat2 ID Feat3 Feat4 TypeID value 1 1 1 6 266 18283 280.00 1 1 1 6 266 18284 20.00 1 1 1 6 266 18285 0.00 1 1 1

Most efficient way of subsetting dataframes

╄→гoц情女王★ 提交于 2019-12-07 07:10:21
问题 Can anyone suggest more efficient way of subsetting dataframe without using SQL/indexing/data.table options? I looked for similar questions, and this one suggests indexing option. Here are ways to subset with timings. #Dummy data dat <- data.frame(x = runif(1000000, 1, 1000), y=runif(1000000, 1, 1000)) #Subset and time system.time(x <- dat[dat$x > 500, ]) # user system elapsed # 0.092 0.000 0.090 system.time(x <- dat[which(dat$x > 500), ]) # user system elapsed # 0.040 0.032 0.070 system.time

Determine which column name is causing 'undefined columns selected' error when using subset()

青春壹個敷衍的年華 提交于 2019-12-07 07:05:22
问题 I'm trying to subset a large data frame from a very large data frame, using data.new <- subset(data, select = vector) where vector is a character string containing the column names I'm trying to isolate. When I do this I get Error in `[.data.frame`(x, r, vars, drop = drop) : undefined columns selected Is there a way to identify which specific column name in the vector is undefined? Through trial and error I've narrowed it down to about 400, but that still doesn't help. 回答1: Find the elements

How to assign same color to factors across plots in a nested loop for ggplot?

落爺英雄遲暮 提交于 2019-12-07 07:03:05
问题 I am trying to use scale_fill_manual to assign corresponding colors to factors across many plots in a nested for loop. However, the resulting plots end up all being black. My overall loop is as follows: for(i in seq(from=0, to=100, by=10)){ for{j in seq(from=0, to=100, by=10)){ print(ggplot(aes(x , y), data = df)+ geom_point(inherit.aes = FALSE,data = subset(df,factor_x==i&factor_y==j), aes(x, y, size=point,color=Group))+ theme_bw()}} I am trying to assign each factor in "Group" its own color

In R: subset or dplyr::filter with variable from vector

烂漫一生 提交于 2019-12-07 05:19:34
问题 df <- data.frame(a=LETTERS[1:4], b=rnorm(4) ) vals <- c("B","D") I can filter/subset df with values in val with: dplyr::filter(df, a %in% vals) subset(df, a %in% vals) Both gives: a b 2 B 0.4481627 4 D 0.2916513 What if I have a variable name in a vector, e.g.: > names(df)[1] [1] "a" Then it doesnt work - I guess because its quoted dplyr::filter(df, names(df)[1] %in% vals) [1] a b <0 rows> (or 0-length row.names) How do you do this ? UPDATE ( what if its dplyr::tbl_df(df) ) Answers below work

Data Frame Subset Performance

99封情书 提交于 2019-12-07 02:07:37
问题 I have a couple of large data frames (1 million+ rows x 6-10 columns) I need to subset repeatedly. The subsetting section is the slowest part of my code and I curious if there is way to do this faster. load("https://dl.dropbox.com/u/4131944/Temp/DF_IOSTAT_ALL.rda") start_in <- strptime("2012-08-20 13:00", "%Y-%m-%d %H:%M") end_in<- strptime("2012-08-20 17:00", "%Y-%m-%d %H:%M") system.time(DF_IOSTAT_INT <- DF_IOSTAT_ALL[DF_IOSTAT_ALL$date_stamp >= start_in & DF_IOSTAT_ALL$date_stamp <= end_in