subset

Select rows of a data.frame that contain only numbers in a certain column

若如初见. 提交于 2019-12-01 05:57:47
问题 How to select only the rows that contain a number in column b . a <- c(1,5,3,1,-8,6,-1) b <- c(4,-2,1,0,"c",2,"DX") df <- data.frame(a,b) df # a b # 1 1 4 # 2 5 -2 # 3 3 1 # 4 1 0 # 5 -8 c # 6 6 2 # 7 -1 DX The output should look like this: # a b # 1 1 4 # 2 5 -2 # 3 3 1 # 4 1 0 # 5 6 2 回答1: You could use grep : df[grep("[[:digit:]]", df$b), ] # a b #1 1 4 #2 5 -2 #3 3 1 #4 1 0 #6 6 2 回答2: This should be faster (it doesn't use regex) df[!is.na(as.numeric(df$b)), ] 来源: https://stackoverflow

Subsetting Data Frame Based on Contents of a “Column” List

戏子无情 提交于 2019-12-01 05:46:32
问题 Set-Up I have a list matrix, where one of the "columns" is a list (I realize it's an odd dataset to work with, but I find it useful for other operations). Each entry of the list is either; (1) empty (integer(0)), (2) an integer, or (3) a vector of integers. E.g. the R object "d.f", With d.f$ID an index vector, and d.f$Basket_List the list. ID <- c(1,2,3,4,5,6,7,8,9) Basket_List <- list(integer(0),c(123,987),c(123,123),456, c(456,123),456,c(123,987),c(987,123),987) d.f <- data.frame(ID) d.f

How do I split a data frame based on range of column values in R?

时间秒杀一切 提交于 2019-12-01 04:44:37
I have a data set like this: Users Age 1 2 2 7 3 10 4 3 5 8 6 20 How do I split this data set into 3 data sets where the first consists of all users with ages between 0–5, second is 6–10 and third is 11–15? You can combine split with cut to do this in a single line of code, avoiding the need to subset with a bunch of different expressions for different data ranges: split(dat, cut(dat$Age, c(0, 5, 10, 15), include.lowest=TRUE)) # $`[0,5]` # Users Age # 1 1 2 # 4 4 3 # # $`(5,10]` # Users Age # 2 2 7 # 3 3 10 # 5 5 8 # # $`(10,15]` # [1] Users Age # <0 rows> (or 0-length row.names) cut splits up

Filter by ranges supplied by two vectors, without a join operation

家住魔仙堡 提交于 2019-12-01 04:20:47
I wish to do exactly this: Take dates from one dataframe and filter data in another dataframe - R except without joining, as I am afraid that after I join my data the result will be too big to fit in memory, prior to the filter. Here is sample data: tmp_df <- data.frame(a = 1:10) I wish to do an operation that looks like this: lower_bound <- c(2, 4) upper_bound <- c(2, 5) tmp_df %>% filter(a >= lower_bound & a <= upper_bound) # does not work as <= is vectorised inappropriately and my desired result is: > tmp_df[(tmp_df$a <= 2 & tmp_df$a >= 2) | (tmp_df$a <= 5 & tmp_df$a >= 4), , drop = F] #

R: Efficiently locating time series segments with maximal cross-correlation to input segment?

那年仲夏 提交于 2019-12-01 04:08:23
I have a long numerical time series data of approximately 200,000 rows (lets call it Z ). In a loop, I subset x (about 30) consecutive rows from Z at a time and treat them as the query point q . I want to locate within Z the y (~300) most correlated time series segments of length x (most correlated with q ). What is an efficient way to accomplish this? The code below finds the 300 segments you are looking for and runs in 8 seconds on my none too powerful Windows laptop, so it should be fast enough for your purposes. First, it constructs a 30-by-199971 matrix ( Zmat ), whose columns contain all

Subset by multiple conditions

╄→尐↘猪︶ㄣ 提交于 2019-12-01 04:02:40
Maybe it's something basic, but I couldn't find the answer. I have Id Year V1 1 2009 33 1 2010 67 1 2011 38 2 2009 45 3 2009 65 3 2010 74 4 2009 47 4 2010 51 4 2011 14 I need to select only the rows that have the same Id but it´s in the three years 2009, 2010 and 2011. Id Year V1 1 2009 33 1 2010 67 1 2011 38 4 2009 47 4 2010 51 4 2011 14 I try d1_3 <- subset(d1, Year==2009 |Year==2010 |Year==2011 ) but it doesn't work. Can anyone provide some suggestions that how I can do this in R? I think ave could be useful here. I call your original data frame 'df'. For each Id, check if 2009-2011 is

Subset a data frame based on column entry (or rank)

廉价感情. 提交于 2019-12-01 03:17:53
I have a data.frame as simple as this one: id group idu value 1 1 1_1 34 2 1 2_1 23 3 1 3_1 67 4 2 4_2 6 5 2 5_2 24 6 2 6_2 45 1 3 1_3 34 2 3 2_3 67 3 3 3_3 76 from where I want to retrieve a subset with the first entries of each group; something like: id group idu value 1 1 1_1 34 4 2 4_2 6 1 3 1_3 34 id is not unique so the approach should not rely on it. Can I achieve this avoiding loops? dput() of data: structure(list(id = c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L), group = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), idu = structure(c(1L, 3L, 5L, 7L, 8L, 9L, 2L, 4L, 6L), .Label = c("1_1", "1_3", "2

How do I split a data frame based on range of column values in R?

蹲街弑〆低调 提交于 2019-12-01 02:46:57
问题 I have a data set like this: Users Age 1 2 2 7 3 10 4 3 5 8 6 20 How do I split this data set into 3 data sets where the first consists of all users with ages between 0–5, second is 6–10 and third is 11–15? 回答1: You can combine split with cut to do this in a single line of code, avoiding the need to subset with a bunch of different expressions for different data ranges: split(dat, cut(dat$Age, c(0, 5, 10, 15), include.lowest=TRUE)) # $`[0,5]` # Users Age # 1 1 2 # 4 4 3 # # $`(5,10]` # Users

Loop linear regression and saving ALL coefficients

℡╲_俬逩灬. 提交于 2019-12-01 01:51:10
Based on the link below, I created a code to run regression on subsets of my data based on a variable. Loop linear regression and saving coefficients In this example I created a DUMMY (0 or 1) to create the subsets (in reality I have 3000 subsets) res <- do.call(rbind, lapply(split(mydata, mydata$DUMMY),function(x){ fit <- lm(y~x1 + x2, data=x) res <- data.frame(DUMMY=unique(x$DUMMY), coeff=coef(fit)) res })) This results in the following dataset DUMMY coeff 0.(Intercept) 0 22.8419956 0.x1 0 -11.5623064 0.x2 0 2.1006948 1.(Intercept) 1 4.2020874 1.x1 1 -0.4924303 1.x2 1 1.0917668 What I would

Subsetting based on co-occurrence within a time window

谁说我不能喝 提交于 2019-12-01 01:43:40
I am having trouble subsetting data based on different attributes in different columns. Here is a dummy data set with species, area where it was found, and time (already in POSIXct). SP Time Area B 07:22 1 F 09:22 4 A 09:22 1 C 08:17 3 D 09:20 1 E 06:55 4 D 09:03 1 E 09:12 2 F 09:45 1 B 09:15 1 I need to subset the rows that have SP==A, plus all other species occurring in the same area (in this case 1), within a time window of +30 and -30 minutes returning this: SP Time Area A 09:22 1 D 09:20 1 D 09:03 1 F 09:45 1 B 09:15 1 I can't get past the conditional statement of this 1-hour window,