subset

split or subset data into 30 minute intervals

谁说胖子不能爱 提交于 2019-11-27 16:53:27
问题 I have a data frame of the following form: Temp Depth Light x time date time.at.depth 104 18.59 -2.7 27 21:38 2012-06-20 4 109 18.59 -2.7 27 22:02 2012-06-20 5 110 18.75 -4.0 27 22:07 2012-06-20 5 113 18.91 -2.7 27 22:21 2012-06-20 4 114 18.91 -4.0 27 22:26 2012-06-20 5 115 18.91 -2.7 27 22:31 2012-06-20 5 117 18.91 -2.7 27 22:40 2012-06-20 4 118 18.75 -4.0 27 22:45 2012-06-20 5 119 18.75 -2.7 27 22:50 2012-06-20 5 121 18.59 -4.0 27 22:59 2012-06-20 4 122 18.75 -2.7 27 23:04 2012-06-20 5 123

Set of all subsets

醉酒当歌 提交于 2019-11-27 16:07:01
问题 In Python2 I could use def subsets(mySet): return reduce(lambda z, x: z + [y + [x] for y in z], mySet, [[]]) to find all subsets of mySet . Python 3 has removed reduce . What would be an equally concise rewrite of this for Python3? 回答1: Here's a list of several possible implementations of the power set (the set of all subsets) algorithm in Python. Some are recursive, some are iterative, some of them don't use reduce . Plenty of options to choose from! 回答2: The function reduce() can always be

Return df with a columns values that occur more than once [duplicate]

那年仲夏 提交于 2019-11-27 15:36:18
This question already has an answer here: Subset data frame based on number of rows per group 2 answers I have a data frame df, and I am trying to subset all rows that have a value in column B occur more than once in the dataset. I tried using table to do it, but am having trouble subsetting from the table: t<-table(df$B) Then I try subsetting it using: subset(df, table(df$B)>1) And I get the error "Error in x[subset & !is.na(subset)] : object of type 'closure' is not subsettable" How can I subset my data frame using table counts? Here is a dplyr solution (using mrFlick's data.frame) library

Delete duplicate rows in two columns simultaneously [duplicate]

二次信任 提交于 2019-11-27 15:34:03
This question already has an answer here: duplicates in multiple columns 2 answers I would like to delete duplicate rows based in two collumns, instead just one. My input df : RAW.PVAL GR allrl Bak 0.05 fr EN1 B12 0.05 fg EN1 B11 0.45 fr EN2 B10 0.35 fg EN2 B066 My output: RAW.PVAL GR allrl Bak 0.05 fr EN1 B12 0.45 fg EN2 B10 0.35 fg EN2 B066 I had tried df<- subset(df, !duplicated(allrl, RAW.PVAL)) , but I do not work to delete rows with this two columns simultaneously duplicated. Thank you! If you want to use subset , you could try: subset(df, !duplicated(subset(df, select=c(allrl, RAW.PVAL)

Keeping only certain rows of a data frame based on a set of values

对着背影说爱祢 提交于 2019-11-27 15:08:49
I have a data frame with an ID column and a few columns for values. I would like to only keep certain rows of the data frame based on whether or not the value of ID at that row matches another set of values (for instance, called "keep"). For simplicity, here is an example: df <- data.frame(ID = sample(rep(letters, each=3)), value = rnorm(n=26*3)) keep <- c("a", "d", "r", "x") How can I create a new data frame consisting of rows that only have IDs that match those of keep? I can do this for just one letter by using the which() function, but with multiple letters I get warning messages and

Collapse rows with overlapping ranges

只愿长相守 提交于 2019-11-27 14:51:09
I have a data.frame with start and end time: ranges<- data.frame(start = c(65.72000,65.72187, 65.94312,73.75625,89.61625),stop = c(79.72187,79.72375,79.94312,87.75625,104.94062)) > ranges start stop 1 65.72000 79.72187 2 65.72187 79.72375 3 65.94312 79.94312 4 73.75625 87.75625 5 89.61625 104.94062 In this example, the ranges in row 2 and 3 are entirely within the range between 'start' on row 1 and stop on row 4. Thus, the overlapping ranges 1-4 should be collapsed to one range: > ranges start stop 1 65.72000 87.75625 5 89.61625 104.94062 I tried this: mdat <- outer(ranges$start, ranges$stop,

from data table, randomly select one row per group

偶尔善良 提交于 2019-11-27 14:50:56
I'm looking for an efficient way to select rows from a data table such that I have one representative row for each unique value in a particular column. Let me propose a simple example: require(data.table) y = c('a','b','c','d','e','f','g','h') x = sample(2:10,8,replace = TRUE) z = rep(y,x) dt = as.data.table( z ) my objective is to subset data table dt by sampling one row for each letter a-h in column z. OP provided only a single column in the example. Assuming that there are multiple columns in the original dataset, we group by 'z', sample 1 row from the sequence of rows per group, get the

Generate all subsets of size k (containing k elements) in Python

喜夏-厌秋 提交于 2019-11-27 14:42:52
I have a set of values and would like to create list of all subsets containing 2 elements. For example, a source set ([1,2,3]) has the following 2-element subsets: set([1,2]), set([1,3]), set([2,3]) Is there a way to do this in python? Seems like you want itertools.combinations : >>> list(itertools.combinations((1, 2, 3), 2)) [(1, 2), (1, 3), (2, 3)] If you want sets you'll have to convert them explicitly. If you don't mind an iterable instead of a list, and you're using Python 3, you can use map : >>> s = set((1, 2, 3)) >>> map(set, itertools.combinations(s, 2)) <map object at 0x10cdc26d8> To

R subsetting a data frame into multiple data frames based on multiple column values

时光总嘲笑我的痴心妄想 提交于 2019-11-27 14:35:34
问题 I am trying to subset a data frame, where I get multiple data frames based on multiple column values. Here is my example >df v1 v2 v3 v4 v5 A Z 1 10 12 D Y 10 12 8 E X 2 12 15 A Z 1 10 12 E X 2 14 16 The expected output is something like this where I am splitting this data frame into multiple data frames based on column v1 and v2 >df1 v3 v4 v5 1 10 12 1 10 12 >df2 v3 v4 v5 10 12 8 >df3 v3 v4 v5 2 12 15 2 14 16 I have written a code which is working right now but don't think that's the best

How to pass “nothing” as an argument to `[` for subsetting?

吃可爱长大的小学妹 提交于 2019-11-27 14:23:00
I was hoping to be able to construct a do.call formula for subsetting without having to identify the actual range of every dimension in the input array. The problem I'm running into is that I can't figure out how to mimic the direct function x[,,1:n,] , where no entry in the other dimensions means "grab all elements." Here's some sample code, which fails. So far as I can tell, either [ or do.call replaces my NULL list values with 1 for the index. x<-array(1:6,c(2,3)) dimlist<-vector('list', length(dim(x))) shortdim<-2 dimlist[[shortdim]] <- 1: (dim(x)[shortdim] -1) flipped <- do.call(`[`,c