aggregate | 易学教程

Ok to provide constructor + trivial operators for behaviorless aggregates?

阅读更多关于 Ok to provide constructor + trivial operators for behaviorless aggregates?

问题 This is a follow-up question to 2043381. Consider the following: struct DataBundle { std::string name; int age; DataBundle() : age(0) {} DataBundle(const std::string& name, int age) : name(name), age(age) {} void swap(DataBundle& rhs) {name.swap(rhs.name); std::swap(age, rhs.age);} DataBundle& operator=(DataBundle rhs) {swap(rhs); return *this;} bool operator==(const DataBundle& rhs) const {return (name == rhs.name) && (age == rhs.age);} bool operator!=(const DataBundle& rhs) const {return !(

R: how to aggregate by real values column with given error tolerance

阅读更多关于 R: how to aggregate by real values column with given error tolerance

问题 Assuming I have a data frame: t <- data.frame(d1=c( 694, 695, 696, 2243, 2244, 2651, 2652 ), d2=c(1.80950881, 1.80951007, 1.80951052, 1.46499982, 1.46500087, 1.14381419, 1.14381319 )) d1 d2 1 694 1.809509 2 695 1.809510 3 696 1.809511 4 2243 1.465000 5 2244 1.465001 6 2651 1.143814 7 2652 1.143813 I'd like to group by the column d2 real values that have very close but not exactly equal values. Thus, in this example, after aggregation, I'd like to obtain the following data set: d1 d2 1 694 1

How can I skip groups while subsetting with key by in data.table?

阅读更多关于 How can I skip groups while subsetting with key by in data.table?

问题 I have this DT: dt=data.table(ID=c(rep(letters[1:2],each=4),'b'),value=seq(1,9)) ID value 1: a 1 2: a 2 3: a 3 4: a 4 5: b 5 6: b 6 7: b 7 8: b 8 9: b 9 I need to eliminate groups while subsetting but only when the data fulfils some condition. Something like this does not work: dt[,{if (.N==4) .SD else NULL v1},by="ID"] So that I need to remove the groups that do not meet the condition. In this example I would like to skip the groups which length is different than 4. So that I get: ID value 1

Terms aggregation based on unique key

阅读更多关于 Terms aggregation based on unique key

问题 I have an index full of documents. Each of them has a key "userid" with a distinct value per user, but each user may have multiple documents. Each user has additional properties (like "color", "animal"). I need to get the agg counts per property which would be: aggs: { colors: { terms: { field: color } }, animals: { terms: { field: animal } } } But I need these counts per unique userid, maybe: aggs: { group-by: { field: userid }, sub-aggs: { colors: { terms: { field: color } }, animals: {

Applying aggregate functions to multiple properties with LINQ GroupBy

阅读更多关于 Applying aggregate functions to multiple properties with LINQ GroupBy

问题 I have a list of Object (it's called: sourceList ) Object contains: Id, Num1, Num2, Num3, Name, Lname Assume I have the following list: 1, 1, 5, 9, 'a', 'b' 1, 2, 3, 2, 'b', 'm' 2, 5, 8, 7, 'r', 'a' How can I return another list (of object2 ) that returns a new list: Id, sum of num1, sum of num2 For the example above, it should return a list of object2 that contains: 1, 3, 8 2, 5, 8 I tried: Dim a = sourceList.GroupBy(Function(item) item.Id). Select(Function(x) x.Sum(Function(y) y.Num1))

How to group time by every n minutes in R

阅读更多关于 How to group time by every n minutes in R

问题 I have a dataframe with a lot of time series: 1 0:03 B 1 2 0:05 A 1 3 0:05 A 1 4 0:05 B 1 5 0:10 A 1 6 0:10 B 1 7 0:14 B 1 8 0:18 A 1 9 0:20 A 1 10 0:23 B 1 11 0:30 A 1 I want to group the time series into every 6 minutes and count the frequency of A and B: 1 0:06 A 2 2 0:06 B 2 3 0:12 A 1 4 0:12 B 1 5 0:18 A 1 6 0:24 A 1 7 0:24 B 1 8 0:18 A 1 9 0:30 A 1 Also, the class of the time series is character. What should I do? 回答1: Here's an approach to convert times to POSIXct , cut the times by 6

Aggregating data based on unique triads in R

阅读更多关于 Aggregating data based on unique triads in R

问题 I was referred here Counting existing permutations in R for previous related question but I can't apply it to my problem. Here is the data I have One <- c(rep("X",6),rep("Y",3),rep("Z",2)) Two <- c(rep("A",4),rep("B",6),rep("C",1)) Three <- c(rep("J",5),rep("K",2),rep("L",4)) Number <- runif(11) df <- data.frame(One,Two,Three,Number) One Two Three Number 1 X A J 0.10511669 2 X A J 0.62467760 3 X A J 0.24232663 4 X A J 0.38358854 5 X B J 0.04658226 6 X B K 0.26789844 7 Y B K 0.07685341 8 Y B L

How to correctly use pandas agg function when running groupby on a column of type timestamp/datetime/datetime64?

阅读更多关于 How to correctly use pandas agg function when running groupby on a column of type timestamp/datetime/datetime64?

问题 I'm trying to understand why calling count() directly on a group returns the correct answer (in this example, 2 rows in that group), but calling count via a lambda in the agg() function returns the beginning of epoch ("1970-01-01 00:00:00.000000002"). # Using groupby(lambda x: True) in the code below just as an illustrative example. # It will always create a single group. x = DataFrame({'time': [np.datetime64('2005-02-25'), np.datetime64('2006-03-30')]}).groupby(lambda x: True) display(x

Linq Aggregate function

阅读更多关于 Linq Aggregate function

问题 I have a List like "test", "bla", "something", "else" But when I use the Aggrate on it and in the mean time call a function it seems to me that after 2 'iterations' the result of the first gets passed in? I am using it like : myList.Aggregate((current, next) => someMethod(current) + ", "+ someMethod(next)); and while I put a breakpoint in the someMethod function where some transformation on the information in the myList occurs, I notice that after the 3rd call I get a result from a former

Aggregate 5-Minute data to hourly sums with present NA's

阅读更多关于 Aggregate 5-Minute data to hourly sums with present NA's

问题 My problem is as follows: I've got a time series with 5-Minute precipitation data like: Datum mm 1 2004-04-08 00:05:00 NA 2 2004-04-08 00:10:00 NA 3 2004-04-08 00:15:00 NA 4 2004-04-08 00:20:00 NA 5 2004-04-08 00:25:00 NA 6 2004-04-08 00:30:00 NA with this structure: 'data.frame': 1098144 obs. of 2 variables: $ Datum: POSIXlt, format: "2004-04-08 00:05:00" "2004-04-08 00:10:00" "2004-04-08 00:15:00" "2004-04-08 00:20:00" ... $ mm : num NA NA NA NA NA NA NA NA NA NA ... As you can see, the