subset | 易学教程

Subset data frame to include only levels of one factor that have values in both levels of another factor

阅读更多关于 Subset data frame to include only levels of one factor that have values in both levels of another factor

I am working with a data frame that deals with numeric measurements. Some individuals have been measured several times, both as juveniles and adults. A reproducible example: ID <- c("a1", "a2", "a3", "a4", "a1", "a2", "a5", "a6", "a1", "a3") age <- rep(c("juvenile", "adult"), each=5) size <- rnorm(10) # e.g. a1 is measured 3 times, twice as a juvenile, once as an adult. d <- data.frame(ID, age, size) My goal is to subset that data frame by selecting the IDs that appear at least once as a juvenile and at least once as an adult. Not sure how to do that..? The resulting dataframe would contain

subset data for a day if data between two hours of the day meets criteria?

阅读更多关于 subset data for a day if data between two hours of the day meets criteria?

I’m fairly new to R and it would be great if you could help out with this problem as i havent been able to find any answers to this problem online. This is part of my data frame (DF) (it goes on until 2008 in this format) Counter Date Hour counts 1245 26/05/2006 0 1 1245 26/05/2006 100 0 1245 26/05/2006 200 2 1245 26/05/2006 300 0 1245 26/05/2006 400 5 1245 26/05/2006 500 3 1245 26/05/2006 600 9 1245 26/05/2006 700 10 1245 26/05/2006 800 15 This is my question: I need to subset my code so that between the hours of 600 and 2200 if there are counts over 0 then I need to keep the whole day (000

Select groups with more than one distinct value per group [duplicate]

阅读更多关于 Select groups with more than one distinct value per group [duplicate]

This question already has answers here : Select groups with more than one distinct value (3 answers) Closed 4 years ago . I have data like below: ID category class 1 a m 1 a s 1 b s 2 a m 3 b s 4 c s 5 d s I want to subset the data by only including those "ID" which have several ( > 1 ) different categories. My expected output: ID category class 1 a m 1 a s 1 b s Is there a way to doing so? I tried library(dplyr) df %>% group_by(ID) %>% filter(n_distinct(category, class) > 1) But it gave me an error: # Error: expecting a single value Using data.table library(data.table) #see: https://github

Subset a data frame for each factor level [duplicate]

阅读更多关于 Subset a data frame for each factor level [duplicate]

This question already has an answer here : Split/subset a data frame by factors in one column [duplicate] (1 answer) Closed 3 years ago . Given the dataset red_wine_data below, how can I create the list l which contains the following four subsetted data frames for all values in unique(red_wine_data$condition) ? I'm looking for a flexible and dynamic solution that produces a result similar to these hard-coded commands, but that will work for any similar data frame even if the factor levels change. l[["red_usa"]] <- subset(red_wine_data, red_wine_data$condition=="USA") l[["red_france"]] <-

Coq: Defining a subtype

阅读更多关于 Coq: Defining a subtype

I have a type, say Inductive Tt := a | b | c. What's the easiest and/or best way to define a subtype of it? Suppose I want the subtype to contain only constructors a and b . A way would be to parametrize on a two-element type, e.g. bool: Definition filt (x:bool): Tt := match x with | true => a | false => b end. Check filt true: Tt. This works but is very awkward if your expression has several (possibly interdependent) subtypes defined this way. Besides, it works only half way, as no subtype is defined. For this I must additionally define e.g. Notation _Tt := ltac: (let T := type of (forall {x

MATLAB - extract selected rows in a table based on some criterion

阅读更多关于 MATLAB - extract selected rows in a table based on some criterion

Let's say I have a table like this: post user date ____ ____ ________________ 1 A 12.01.2014 13:05 2 B 15.01.2014 20:17 3 A 16.01.2014 05:22 I want to create a smaller table (but not delete the original one!) containing all posts of - for example - user A including the dates that those were posted on. When looking at MATLAB's documentation (see the very last part for deleting rows) I discovered that MATLAB allows you to create a mask for a table based on some criterion. So in my case if I do something like this: postsA = myTable.user == 'A' I get a nice mask vector as follows: >> postsA = 1 0

Concatenate expressions to subset a dataframe

阅读更多关于 Concatenate expressions to subset a dataframe

I am attempting to create a function that will calculate the mean of a column in a subsetted dataframe. The trick here is that I always to want to have a couple subsetting conditions and then have the option to pass more conditions to the functions to further subset the dataframe. Suppose my data look like this: dat <- data.frame(var1 = rep(letters, 26), var2 = rep(letters, each = 26), var3 = runif(26^2)) head(dat) var1 var2 var3 1 a a 0.7506109 2 b a 0.7763748 3 c a 0.6014976 4 d a 0.6229010 5 e a 0.5648263 6 f a 0.5184999 I want to be able to do the subset shown below, using the first

Filter where there are at least two pattern matches

阅读更多关于 Filter where there are at least two pattern matches

问题 I have a lot of text data in a data.table. I have several text patterns that I'm interested in. I want to subset the table so it shows text that matches at least two of the patterns. This is further complicated by the fact that some of the patterns already are an either/or, for example something like "paul|john" . I think I either want an expression that would mean directly to subset on that basis, or alternatively if I could count the number of times the patterns occur I could then use that

extracting rows of group medians n R

阅读更多关于 extracting rows of group medians n R

If I have a data frame like the following: v2 <- c(4.5, 2.5, 3.5, 5.5, 7.5, 6.5, 2.5, 1.5, 3.5) v1 <- c(2.2, 3.2, 1.2, 4.2, 2.2, 3.2, 2.2, 1.2, 5.2) lvl <- c("a","a","a","b","b","b","c","c","c") d <- data.frame(v1,v2,lvl) > d v1 v2 lvl 1 2.2 4.5 a 2 3.2 2.5 a 3 1.2 3.5 a 4 4.2 5.5 b 5 2.2 7.5 b 6 3.2 6.5 b 7 2.2 2.5 c 8 1.2 1.5 c 9 5.2 3.5 c Within each level of d$lvl , I want to extract the row with value of d$v1 being median (for the simplest case, each level of d$lvl has three rows). So I want to get: v1 v2 l 1 2.2 4.5 a 6 3.2 6.5 b 7 2.2 2.5 c For groups with odd number of rows this works.

How to subset data using multidimensional coordinates using python xarray?

阅读更多关于 How to subset data using multidimensional coordinates using python xarray?

I have a netcdf file that uses multidimensional coordinates. My xarray dataset looks like this <xarray.Dataset> Dimensions: (Time: 48, bottom_top: 50, bottom_top_stag: 51, soil_layers_stag: 4, south_north: 1015, south_north_stag: 1016, west_east: 1359, west_east_stag: 1360) Coordinates: XLAT (Time, south_north, west_east) float32 18.1363 18.1456 ... XLAT_U (Time, south_north, west_east_stag) float32 18.1316 ... XLAT_V (Time, south_north_stag, west_east) float32 18.1198 ... XLONG (Time, south_north, west_east) float32 -122.884 ... XLONG_U (Time, south_north, west_east_stag) float32 -122.901 ...