subset | 易学教程

Merging two data.frames by key column

阅读更多关于 Merging two data.frames by key column

问题 I have two dataframes. In the first one, I have a KEY/ID column and two variables: KEY V1 V2 1 10 2 2 20 4 3 30 6 4 40 8 5 50 10 In the second dataframe, I have a KEY/ID column and a third variable KEY V3 1 5 2 10 3 20 I would like to extract the rows of the first dataframe that are also in the second dataframe by matching them according to the KEY column. I would also like to add the V3 column to final dataset. KEY V1 V2 V3 1 10 2 5 2 20 4 10 3 30 6 20 This are my attempts by using the

recoding variables in R with a lookup table

阅读更多关于 recoding variables in R with a lookup table

问题 I have a question about recoding data. I would like to use a lookup table and I am wondering how to recode NA and use an approach similar to %in%. Sample data: gender <- c("Female", "Not Disclosed", "Unknown" , "Male", "Male", "Female", NA) df_gender <- as.data.frame(gender) df_gender$gender <- as.character(gender) My first approach to recode is: df_gender$gender[df_gender$gender == "Female"] <- "F" df_gender$gender[df_gender$gender == "Male"] <- "M" df_gender$gender[df_gender$gender %in% c(

Get all possible subsets - preserving order

阅读更多关于 Get all possible subsets - preserving order

问题 This is a follow up to this question: Generate all "unique" subsets of a set (not a powerset) My problem is the same, but I think there might be a more optimized solution when order of items in the new subsets and across the subsets needs to be preserved. Example: [1, 2, 3] Would result in: [[1], [2], [3]] [[1, 2], [3]] [[1], [2, 3]] [[1, 2, 3]] 回答1: I've already answered this question for Python, so I quickly ported my solution over to Ruby: def spannings(lst) return enum_for(:spannings, lst

How can I obtain the largest set of rows that share a common set of at least 4 columns?

阅读更多关于 How can I obtain the largest set of rows that share a common set of at least 4 columns?

问题 I have a matrix containing gene names and sample numbers. Each row is a logical vector indicating the samples in which a gene was detected. Genes must appear in a minimum of 4 samples out of 8 to make it this far (still be in the matrix). i.e., all genes in this matrix appear in 4 or more samples. Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7 Sample8 gene1 TRUE FALSE TRUE TRUE TRUE FALSE FALSE FALSE gene2 FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE gene3 TRUE TRUE FALSE TRUE FALSE

subset rows + context

阅读更多关于 subset rows + context

问题 I haven't been able to figure out an easy way to include some context ( n adjacent rows ) around the rows I want to select. I am more or less trying to mirror the -C option of grep to select some rows of a data.frame. Ex: a= data.frame(seq(1:100)) b = c(50, 60, 61) Let's say I want a context of 2 lines around the rows indexed in b; the desired output should be the data frame subset of a with the rows 48,49,50,51,52,58,59,60,61,62,63 回答1: You can do something like this, but there may be a more

Subset a data frame based on column entry (or rank)

阅读更多关于 Subset a data frame based on column entry (or rank)

问题 I have a data.frame as simple as this one: id group idu value 1 1 1_1 34 2 1 2_1 23 3 1 3_1 67 4 2 4_2 6 5 2 5_2 24 6 2 6_2 45 1 3 1_3 34 2 3 2_3 67 3 3 3_3 76 from where I want to retrieve a subset with the first entries of each group; something like: id group idu value 1 1 1_1 34 4 2 4_2 6 1 3 1_3 34 id is not unique so the approach should not rely on it. Can I achieve this avoiding loops? dput() of data: structure(list(id = c(1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L), group = c(1L, 1L, 1L, 2L,

Count the total number of subsets that don't have consecutive elements

阅读更多关于 Count the total number of subsets that don't have consecutive elements

问题 I'm trying to solve pretty complex problem with combinatorics and counting subsets. First of all let's say we have given set A = {1, 2, 3, ... N} where N <= 10^(18). Now we want to count subsets that don't have consecutive numbers in their representation. Example Let's say N = 3, and A = {1,2,3}. There are 2^3 total subsets but we don't want to count the subsets (1,2), (2,3) and (1,2,3). So in total for this question we want to answer 5 because we want to count only the remaining 5 subsets.

Subset a data frame based on value pairs stored in independent ordered vectors

阅读更多关于 Subset a data frame based on value pairs stored in independent ordered vectors

问题 I have an R dataframe that I need to subset data from. The subsetting will be based on two columns in the dataframe. For example: A <- c(1,2,3,3,5,1) B <- c(6,7,8,9,8,8) Value <- c(9,5,2,1,2,2) DATA <- data.frame(A,B,Value) This is how DATA looks A B Value 1 6 9 2 7 5 3 8 2 3 9 1 5 8 2 1 8 2 I want those rows of data for which (A,B) combination is (1,6) and (3,8). These pairs are stored as individual (ordered) vectors of A and B: AList <- c(1,3) BList <- c(6,8) Now, I am trying to subset the

Identifying and subsetting data points per variable on concentric circles

阅读更多关于 Identifying and subsetting data points per variable on concentric circles

问题 I have a data frame that looks like this: structure(list(A = c(10, 10, 10, 10, 10, 10), T = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6), X = c(673.05, 672.3, 672.3, 672.3, 667.82, 667.82), Y = c(203.93, 203.93, 203.93, 203.93, 209.16, 209.16 ), V = c(14.79, 14.94, 0, 12.677, 14.94, 14.94)), .Names = c("A", "T", "X", "Y", "V"), row.names = c(NA, 6L), class = "data.frame") Briefly, my data are x,y positions of a specific object (A). I want to subset my data for a specific time (T) in a specific position (X

Subset list based on value of dictionary element

阅读更多关于 Subset list based on value of dictionary element

问题 I have a list which is made up of dictionaries. I wish to subset the list, selecting the dictionaries based a comparison of element values (in this case, selecting only one dictionary per date, with the dict that's selected being the one with the largest realtime_start value). An example list is: obs = [{'date': '2012-10-01', 'realtime_end': '2013-02-18', 'realtime_start': '2012-11-15', 'value': '231.751'}, {'date': '2012-10-01', 'realtime_end': '9999-12-31', 'realtime_start': '2012-12-19',