subset

subset in parallel using a list of dataframes and a list of vectors

空扰寡人 提交于 2019-12-12 04:32:45
问题 This works: onion$yearone$id %in% mask$yearone This doesn't: onion[1][1] %in% mask[1] onion[1]['id'] %in% mask[1] Why? Short of an obvious way to vectorize in parallel columns in DF and in memberids (so I only get rows within each year when ids are present in both DF and memberids), im using a for loop, but I'm not being lucky at finding the right way to express the index... Help? Example data: yearone <- data.frame(id=c("b","b","c","a","a"),v=rnorm(5)) onion <- list() onion[[1]] <- yearone

Create a list of a list of dataframes, by subsetting a list of dataframes in R

半城伤御伤魂 提交于 2019-12-12 04:22:27
问题 I have a list of 6 dataframes, and I would like to create a list of 6 lists of 24 dataframes, the 24 dataframes being subsets of the original 6 dataframes. Here is a shorter example of what I'm trying to do: months <- c(0:35) product<- c(112:147) index <- rnorm(36) df1 <- data.frame(months, product, index) product2<- c(212:247) index2 <- rnorm(36) df2 <- data.frame(months, product2, index2) product3<- c(312:347) index3 <- rnorm(36) df3 <- data.frame(months, product3, index3) dflist <- list

R- rationale for recycling boolean indices for selection

£可爱£侵袭症+ 提交于 2019-12-12 03:56:25
问题 The title is self explaining. I would like to know why R has chosen to recycle boolean values for selection/subsetting? The documentation for "[" states Such vectors are recycled if necessary to match the corresponding extent. i, j Are there any advantages of doing this? I could think of one as mentioned below, but I'd think the disadvantages might outweigh the benefits of ease of use. df<- data.frame(C1=1:10,c2=101:110) class(unclass(df)[1]) # df is a list of two lists, each a column of df

R different behavior when accessing columns from within function as opposed to interactively

雨燕双飞 提交于 2019-12-12 03:51:03
问题 I have a data frame named granular that contains, in relevant part: factor column GranularClass , one of whose values is "Constitutional Law I Spring 2016" , and several numeric columns, for example Knowledge . The numeric columns contain NAs. I'm trying to write a function that counts the non-NA values for a given column, conditional on a given factor value. However, my attempt to count the values behaves differently depending on whether I write it as a function or just use it in the console

How to subset dataframe using string values from a list?

筅森魡賤 提交于 2019-12-12 03:44:12
问题 I have a data.frame with several variables for a universe of stocks and I want to create a subset of this data frame that filters the data I have for just the S&P 500 stocks. I created a list of all the stocks in the S&P 500, and I basically want the program to go through my data frame and copy over all the rows which contain an item from my S&P 500 list. I tried using a for-loop and that crashed my RStudio, so if anyone knows if there's a way I can do this, please let me know! This code

Subset data from a part of string

元气小坏坏 提交于 2019-12-12 03:28:27
问题 I have the following dataset: dat2 <- read.table(header=TRUE, text=" ID De Ep Ti ID1 A1123 A117 121 100 11231 A1123MDN A108 C207 D110 E11232 A1124MDN A122 C207 D110 E11232 A1124MDN A117 C207 D110 E11232 A1124 A122 C208 D110 E11232 B1125MDN A108 C208 D110 E11232 B1125MDN A108 C208 D110 E11232 B1126MDN A122 C208 D110 E11233 C1126 A109 C208 D111 E11233 ") dat2 ID De Ep Ti ID1 1 A1123 A117 121 100 11231 2 A1123MDN A108 C207 D110 E11232 3 A1124MDN A122 C207 D110 E11232 4 A1124MDN A117 C207 D110

pasting two dataframes of different sizes

こ雲淡風輕ζ 提交于 2019-12-12 02:14:48
问题 I would like to paste strings from 2 dfs n and p - dput at the end. They have different sizes nrow(n) = 25 and nrow(p) = 20 with two factors : factor1 (binary) and factor2 (integers) head(n,3) head(p,3) string factor1 factor2 string factor1 factor2 -- -- -- -- -- -- h f1 5 i f1 1 h f1 6 c f1 2 h f1 7 c f1 3 tail(n,3) tail(p,3) string factor1 factor2 string factor1 factor2 -- -- -- -- -- -- a f2 27 h f2 18 g f2 28 i f2 19 b f2 29 i f2 20 Here, I would like to create a dataframe which does not

Subsetting rows from Matlab for which specific column has value greater than zero

删除回忆录丶 提交于 2019-12-12 01:57:56
问题 I want to subset rows from matrix for which the value in third column is greater than zero. For example, I have a matrix : test = 1 2 3 4 5 0 4 4 1 4 4 0 Now I want to subset it so that I have subset = 1 2 3 4 4 1 Any quick suggestion on how I can do this in matlab? 回答1: Simply make a logical array that is true for every row you want to keep, and pass it as the index to the rows: subset = test(test(:,3)>0, :) 来源: https://stackoverflow.com/questions/27226622/subsetting-rows-from-matlab-for

LINQ: Compare two lists and count subset

喜欢而已 提交于 2019-12-12 01:53:38
问题 I am comparing 2 lists and I need to collect occurrences of a subset (modulesToDelete) from the master list (allModules) ONLY when MORE than one occurrence is found. (allModules contains modulesToDelete). Multiple occurrences of any module in modulesToDelete means those modules are being shared. One occurrence of a module in modulesToDelete means that module is isolated and is safe to delete (it just found itself). I can do this with nested foreach loops but this is as far as I got with a

Can we conclude a set might not be random by checking its subset?

為{幸葍}努か 提交于 2019-12-12 01:37:48
问题 Set A includes 1000 numbers. I checked that half of the numbers in this set are even. I extracted subset B from set A as follow: any number in set A which starts with 1 is also in set B. (All numbers in B start with 1). I checked that more than half of the numbers in set B are even. Half of the numbers in A are even so should we expect the same for B? But more than half of B are even. So can conclude that set A is not random? If 60% of B are even, can we still conclude A is not generated