subset | 易学教程

Remove group from data.frame if at least one group member meets condition

阅读更多关于 Remove group from data.frame if at least one group member meets condition

问题 I have a data.frame where I'd like to remove entire groups if any of their members meets a condition. In this first example, if the values are numbers and the condition is NA the code below works. df <- structure(list(world = c(1, 2, 3, 3, 2, NA, 1, 2, 3, 2), place = c(1, 1, 2, 2, 3, 3, 1, 2, 3, 1), group = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 3)), .Names = c("world", "place", "group"), row.names = c(NA, -10L), class = "data.frame") ans <- ddply(df, . (group), summarize, code=mean(world)) ans$code[is

R: How to filter/subset a sequence of dates

阅读更多关于 R: How to filter/subset a sequence of dates

问题 i've this data: (complete for Dicember) date sessions 1 2014-12-01 1932 2 2014-12-02 1828 3 2014-12-03 2349 4 2014-12-04 8192 5 2014-12-05 3188 6 2014-12-06 3277 And a need to subet/filter this, for example from "2014-12-05" to "2014-12-25" I now that you can create a sequence with the operator ":". Example: b <- c(1:5) But How to filter a sequence? I tried this NewDate <- filter(Dates, date("2014-12-05":"2014-12-12")) But says: Error: unexpected symbol in: "NewDate <- filter(Dates, date(

Why does dplyr's filter drop NA values from a factor variable?

阅读更多关于 Why does dplyr's filter drop NA values from a factor variable?

问题 When I use filter from the dplyr package to drop a level of a factor variable, filter also drops the NA values. Here's an example: library(dplyr) set.seed(919) (dat <- data.frame(var1 = factor(sample(c(1:3, NA), size = 10, replace = T)))) # var1 # 1 <NA> # 2 3 # 3 3 # 4 1 # 5 1 # 6 <NA> # 7 2 # 8 2 # 9 <NA> # 10 1 filter(dat, var1 != 1) # var1 # 1 3 # 2 3 # 3 2 # 4 2 This does not seem ideal -- I only wanted to drop rows where var1 == 1 . It looks like this is occurring because any comparison

Select groups with more than one distinct value

阅读更多关于 Select groups with more than one distinct value

I have data with a grouping variable ("from") and values ("number"): from number 1 1 1 1 2 1 2 2 3 2 3 2 I want to subset the data and select groups which have two or more unique values. In my data, only group 2 has more than one distinct 'number', so this is the desired result: from number 2 1 2 2 Several possibilities, here's my favorite library(data.table) setDT(df)[, if(+var(number)) .SD, by = from] # from number # 1: 2 1 # 2: 2 2 Basically, per each group we are checking if there is any variance, if TRUE , then return the group values With base R, I would go with df[as.logical(with(df,

Subset and ggplot2

阅读更多关于 Subset and ggplot2

I have a problem to plot a subset of a data frame with ggplot2. My df is like: ID Value1 Value2 P1 100 12 P1 120 13 ... P2 300 11 P2 400 16 ... P3 130 15 P3 140 12 ... How can I now plot Value1 vs Value2 only for IDs P1 and P3? For example I tried: ggplot(subset(df,ID=="P1 & P3") + geom_line(aes(Value1, Value2, group=ID, colour=ID))) but I always receive an error. p.s. I also tried many combination with P1 & P3 but I always failed.. Here 2 options for subsetting: Using subset from base R: library(ggplot2) ggplot(subset(dat,ID %in% c("P1" , "P3"))) + geom_line(aes(Value1, Value2, group=ID,

In R, how do I subset a data.frame by values from another data.frame?

阅读更多关于 In R, how do I subset a data.frame by values from another data.frame?

问题 I have two data frames. The first, df.1 , contains two columns of paired numerical identifiers, where each column includes ~100,000 rows. The second data frame, df.2 , includes one column ( df.2$C ) of numerical identifiers. This data frame has around 200 rows. How can I find the paired subset of data of df.1 that includes only the rows with values of the identifiers found in df.2$C ? The final subset would include the paired data of df.1 which corresponds to identifiers found in df.2$C that

Subsetting R data frame results in mysterious NA rows

阅读更多关于 Subsetting R data frame results in mysterious NA rows

I've been encountering what I think is a bug. It's not a big deal, but I'm curious if anyone else has seen this. Unfortunately, my data is confidential, so I have to make up an example, and it's not going to be very helpful. When subsetting my data, I occassionally get mysterious NA rows that aren't in my original data frame. Even the rownames are NA. EG: example <- data.frame("var1"=c("A", "B", "A"), "var2"=c("X", "Y", "Z")) example var1 var2 1 A X 2 B Y 3 A Z then I run: example[example$var1=="A",] var1 var2 1 A X 3 A Z NA<NA> <NA> Of course, the example above does not actually give you this

Generate all subsets of size k (containing k elements) in Python

阅读更多关于 Generate all subsets of size k (containing k elements) in Python

问题 I have a set of values and would like to create list of all subsets containing 2 elements. For example, a source set ([1,2,3]) has the following 2-element subsets: set([1,2]), set([1,3]), set([2,3]) Is there a way to do this in python? 回答1: Seems like you want itertools.combinations: >>> list(itertools.combinations((1, 2, 3), 2)) [(1, 2), (1, 3), (2, 3)] If you want sets you'll have to convert them explicitly. If you don't mind an iterable instead of a list, and you're using Python 3, you can

Check if list<t> contains any of another list

阅读更多关于 Check if list contains any of another list

问题 I have a list of parameters like this: public class parameter { public string name {get; set;} public string paramtype {get; set;} public string source {get; set;} } IEnumerable<Parameter> parameters; And a array of strings i want to check it against. string[] myStrings = new string[] { "one", "two"}; I want to iterate over the parameter list and check if the source property is equal to any of the myStrings array. I can do this with nested foreach's but i would like to learn how to do it in a

How do I extract a single column from a data.frame as a data.frame? [duplicate]

阅读更多关于 How do I extract a single column from a data.frame as a data.frame? [duplicate]

This question already has an answer here: How to subset matrix to one column, maintain matrix data type, maintain row/column names? 1 answer Say I have a data.frame: df <- data.frame(A=c(10,20,30),B=c(11,22,33), C=c(111,222,333)) A B C 1 10 11 111 2 20 22 222 3 30 33 333 If I select two (or more) columns I get a data.frame: x <- df[,1:2] A B 1 10 11 2 20 22 3 30 33 This is what I want. However, if I select only one column I get a numeric vector: x <- df[,1] [1] 1 2 3 I have tried to use as.data.frame(), which does not change the results for two or more columns. it does return a data.frame in