aggregate | 易学教程

R sum a variable by two groups

阅读更多关于 R sum a variable by two groups

问题 I have a data frame in R that generally takes this form: ID Year Amount 3 2000 45 3 2000 55 3 2002 10 3 2002 10 3 2004 30 4 2000 25 4 2002 40 4 2002 15 4 2004 45 4 2004 50 I want to sum the Amount by ID for each year, and get a new data frame with this output. ID Year Amount 3 2000 100 3 2002 20 3 2004 30 4 2000 25 4 2002 55 4 2004 95 This is an example of what I need to do, in reality the data is much larger. Please help, thank you! 回答1: You can group_by ID and Year then use sum within

DDD - persisting aggregate children only if changed

阅读更多关于 DDD - persisting aggregate children only if changed

问题 I'm trying use DDD in an application I'm currently working on. I have a following UserAggregate structure: UserAggregate - ProfileEntity - ImageEntity - RatingEntity And i have a UserRepository which is querying entities mappers to build a UserAggregate. Now I'd like to pass the UserAggregate to the UserRepository for persistance, like UserRepository->save(UserAggregate) . How do I tell the UserRepository that UserAggregate children entities have changed and needs to be saved? Is there any

How to preserve column names when dynamically passing data frame columns to `aggregate`

阅读更多关于 How to preserve column names when dynamically passing data frame columns to `aggregate`

问题 With a data frame like below df1 <- data.frame(a=seq(1.1,9.9,1.1), b=seq(0.1,0.9,0.1), c=rev(seq(10.1, 99.9, 11.1))) I want to aggregate cols b and c by a So I would do something like this aggregate(cbind(b,c) ~ a, data = df1, mean) This would get it done. However I want to generalize without hard coded column names like in a function. myAggFunction <- function (df, col_main, col_1, col_2){ return (aggregate(cbind(df[,col1], df[,col2]) ~ df[,col_main], df, mean)) } myAggFunction(df, 1, 2, 3)

Calculate Max of Sum of an annotated field over a grouped by query in Django ORM?

阅读更多关于 Calculate Max of Sum of an annotated field over a grouped by query in Django ORM?

问题 To keep it simple I have four tables(A, B, Category and Relation), Relation table stores the Intensity of A in B and Category stores the type of B. A <--- Relation ---> B ---> Category (So the relation between A and B is n to n, when the relation between B and Category is n to 1) I need an ORM to group Relation records by Category and A, then calculate Sum of Intensity in each (Category, A) (seems simple till here), then I want to annotate Max of calculated Sum in each Category. My code is

Pandas: using multiple functions in a group by

阅读更多关于 Pandas: using multiple functions in a group by

问题 My data has ages, and also payments per month. I'm trying to aggregate summing the payments, but without summing the ages (averaging would work). Is it possible to use different functions for different columns? 回答1: You can pass a dictionary to agg with column names as keys and the functions you want as values. import pandas as pd import numpy as np # Create some randomised data N = 20 date_range = pd.date_range('01/01/2015', periods=N, freq='W') df = pd.DataFrame({'ages':np.arange(N),

Group/bin/bucket data in R and get count per bucket and sum of values per bucket

阅读更多关于 Group/bin/bucket data in R and get count per bucket and sum of values per bucket

问题 I wish to bucket/group/bin data : C1 C2 C3 49488.01172 0.0512 54000 268221.1563 0.0128 34399 34775.96094 0.0128 54444 13046.98047 0.07241 61000 2121699.75 0.00453 78921 71155.09375 0.0181 13794 1369809.875 0.00453 12312 750 0.2048 43451 44943.82813 0.0362 49871 85585.04688 0.0362 18947 31090.10938 0.0362 13401 68550.40625 0.0181 14345 I want to bucket it by C2 values but I wish to define the buckets e.g. <=0.005, <=.010, <=.014 etc. As you can see, the bucketing will be uneven intervals. I

Converting aggregate operators from SQL to relational algebra

阅读更多关于 Converting aggregate operators from SQL to relational algebra

问题 I have several SQL queries written that I want to convert to relational algebra. However, some of the queries use aggregate operators and I don't know how to convert them. Notably they use COUNT and GROUP BY.. HAVING operators. Here is the schema: Sailors( sid , sname, rating) Reserves( sid , bid , price) Boats( bid , bname) Here is an example of what I'm doing: find the bids and bnames of all boats reserved by exactly 2 sailors. SELECT B.bid, B.bname FROM Boats B, Reserves R WHERE B.bid = R

When and how to use Aggregate Target in xcode 4

阅读更多关于 When and how to use Aggregate Target in xcode 4

问题 I was trying to look for an example of using an Aggregate Target in Xcode4, including its purpose and why a developer should use it. Do you have any reference link, especially from Apple Developer web site? 回答1: Aggregate Target Xcode defines a special type of target that lets you build a group of targets at once, even if those targets do not depend on each other. An aggregate target has no associated product and no build rules. Instead, an aggregate target depends on each of the targets you

Pandas: apply different functions to different columns

阅读更多关于 Pandas: apply different functions to different columns

问题 When using df.mean() I get a result where the mean for each column is given. Now let's say I want the mean of the first column, and the sum of the second. Is there a way to do this? I don't want to have to disassemble and reassemble the DataFrame . My initial idea was to do something along the lines of pandas.groupby.agg() like so: df = pd.DataFrame(np.random.random((10,2)), columns=['A','B']) df.apply({'A':np.mean, 'B':np.sum}, axis=0) Traceback (most recent call last): File "<ipython-input

aggregating multiple columns in data.table

阅读更多关于 aggregating multiple columns in data.table

问题 I have the following sample data.table : dtb <- data.table(a=sample(1:100,100), b=sample(1:100,100), id=rep(1:10,10)) I would like to aggregate all columns (a and b, though they should be kept separate) by id using colSums , for example. What is the correct way to do this? The following does not work: dtb[,colSums, by="id"] This is just a sample and my table has many columns so I want to avoid specifying all of them in the function name 回答1: this is actually what i was looking for and is