subset | 易学教程

Subsetting a data frame based on key spanning several columns in another (summary) data frame

阅读更多关于 Subsetting a data frame based on key spanning several columns in another (summary) data frame

I have a data frame a with 4 identifying columns: A, B, C, D . A second data frame b , created with ddply() , contains a summary of all the values for different D s for every set of A,B,C . A third data frame c contains a subset of b with bad values that I want to delete from a . Thus, I want a subset from a , omitting all the rows identified by a combination of A,B,C that are also present in c . I can think of ways do this (ugly and inefficiently) in a loop, but, my DBA background encourages me to seek a solution that is a little bit more … direct. In code: a <- data.frame( A=rep(c('2013-10

R - subset column based on condition on duplicate rows

阅读更多关于 R - subset column based on condition on duplicate rows

I have a dataframe with an id column that is repeated, with site counts. I want to know how I can remove the duplicates ID records only when Site_Count record is more than 0. Generate DF: DF <- data.frame( 'ID' = sample(100:300, 100, replace=T), 'Site_count' = sample(0:1, 100, replace=T) ) My attempt at the subset: subset(DF[!duplicated(DF$ID),], site_count > 0) But in this case it will remove all 0 site counts - I want to subset to only remove the record when there is a duplicate record with more than 0 site count. Desirable results would look something like this (notice there site IDs with 0

R Subsetting a data.frame when 2 columns have different values

阅读更多关于 R Subsetting a data.frame when 2 columns have different values

I have a data.frame like this: Type1 rep1 Type2 rep2 stat p.value 17 DqSAD 1 rnzDqSAD 9 3.7946 0.0101 18 DqSAD 1 DqSAD 10 -0.5278 0.6428 19 DqSAD 1 rnzDqSAD 10 0.4111 0.2231 20 rnzDqSAD 1 DqSAD 2 -0.3111 0.5085 21 rnzDqSAD 1 rnzDqSAD 2 -0.8904 0.9080 and I would like to subset it when the columns Type1 & Type 2 have different values. I mean in an automatic way, not explicitly checking for this particular values like Type1=="DqSAD" & Type2=="rnzDqSAD" I remember this could be done with sql, but I don't figure out how to do it in R. Thanks! You can do this by finding the rows where Type1 and

Find all possible subsets that sum up to a given number

阅读更多关于 Find all possible subsets that sum up to a given number

问题 I'm learning Python and I have a problem with this seems to be simple task. I want to find all possible combination of numbers that sum up to a given number. for example: 4 -> [1,1,1,1] [1,1,2] [2,2] [1,3] I pick the solution which generate all possible subsets (2^n) and then yield just those that sum is equal to the number. I have a problem with the condition. Code: def allSum(number): #mask = [0] * number for i in xrange(2**number): subSet = [] for j in xrange(number): #if : subSet.append(j

Find all possible subsets that sum up to a given number

阅读更多关于 Find all possible subsets that sum up to a given number

I'm learning Python and I have a problem with this seems to be simple task. I want to find all possible combination of numbers that sum up to a given number. for example: 4 -> [1,1,1,1] [1,1,2] [2,2] [1,3] I pick the solution which generate all possible subsets (2^n) and then yield just those that sum is equal to the number. I have a problem with the condition. Code: def allSum(number): #mask = [0] * number for i in xrange(2**number): subSet = [] for j in xrange(number): #if : subSet.append(j) if sum(subSet) == number: yield subSet for i in allSum(4): print i BTW is it a good approach? Here's

subset data.table keeping only elements greater than certain value applied to all columns

阅读更多关于 subset data.table keeping only elements greater than certain value applied to all columns

I would like to subset news (below) to create news2 (further below) which will only include the rows/columns where the abs(value) in each element of news > 0.01. Below is the code that I have tried: gr <- data.frame(which(abs(news[, 1:ncol(news), with = FALSE]) > 0.01, arr.ind = TRUE)) news2a <- news[gr$row, c(1, gr$col + 1L), with = FALSE] news2a[, which(duplicated(names(news2a))) := NULL] The code above does not always work. Note: In the real data set, there are both more rows and columns. # news ID diff.jan diff.feb diff.mar diff.apr 1: 7 -2.998852570e-13 2.764079712e-13 -3.291735832e-13 0

why use \0 to include highEndPoint as part of the sublist

阅读更多关于 why use \0 to include highEndPoint as part of the sublist

问题 I saw the code below from java tutorial oracle. In order to count the number of words between doorbell (inclusive) and pickle (inclusive), the author added \0 after the word pickle . I understand that the effect of adding \0 after pickle , is that the word pickle is now included as part of the subset. But my question is, why use \0 ? Could someone please help me out? Thanks in advance for any help! SortedSet<String> dictionary = new TreeSet<>(entire collection of words from a dictionary); int

How to subset data for a specific column with ddply?

阅读更多关于 How to subset data for a specific column with ddply?

问题 I would like to know if there is a simple way to achieve what I describe below using ddply . My data frame describes an experiment with two conditions. Participants had to select between options A and B , and we recorded how long they took to decide, and whether their responses were accurate or not. I use ddply to create averages by condition. The column nAccurate summarizes the number of accurate responses in each condition. I also want to know how much time they took to decide and express

Wrapper for a function relying on non-standard evaluation in R

阅读更多关于 Wrapper for a function relying on non-standard evaluation in R

问题 I wrote a wrapper around ftable because I need to compute flat tables with frequency and percentage for many variables: mytable <- function(...) { tab <- ftable(..., exclude = NULL) prop <- prop.table(x = tab, margin = 2) * 100 bind <- cbind(as.matrix(x = tab), as.matrix(x = prop)) margin <- addmargins(A = bind, margin = 1) round(x = margin, digits = 1) } mytable(formula = wool + tension ~ breaks, data = warpbreaks) A_L A_M A_H B_L B_M B_H A_L A_M A_H B_L B_M B_H 10 0 0 1 0 0 0 0.0 0.0 11.1 0

Reason for unexpected output in subsetting data frame - R

阅读更多关于 Reason for unexpected output in subsetting data frame - R

I have the data frame "a" and it has a variable called "VAL". I want to count the elements where the value of VAL is 23 or 24. I used two codes which worked Ok: nrow(subset(a,VAL==23|VAL==24) nrow(subset(a,VAL %in% c(23,24))) But, I tried other code which gives an unexpected output and I don't know why. nrow(subset(a,VAL ==c(23,24))) Even if I change the order of 23 and 24, it gives a different unexpected output. nrow(subset(a,VAL ==c(24,23))) Why are those codes incorrect ? What are they actually doing? Working through an example shows where it is going wrong: a <- data.frame(VAL=c(1,1,1,23