aggregate | 易学教程

Groupby bins and aggregate in R

阅读更多关于 Groupby bins and aggregate in R

问题 I have data like (a,b,c) a b c 1 2 1 2 3 1 9 2 2 1 6 2 where 'a' range is divided into n (say 3) equal parts and aggregate function calculates b values (say max) and grouped by at 'c' also. So the output looks like a_bin b_m(c=1) b_m(c=2) 1-3 3 6 4-6 NaN NaN 7-9 NaN 2 Which is MxN where M=number of a bins, N=unique c samples or all range How do I approach this? Can any R package help me through? 回答1: There would be easier ways. If your dataset is dat res <- sapply(split(dat[, -3], dat$c),

Check command for validity with data from other aggregate

阅读更多关于 Check command for validity with data from other aggregate

问题 I am currently working on my first bigger DDD application. For now it works pretty well, but we are stuck with an issue since the early days that I cannot stop thinking about: In some of our aggreagtes we keep references to another aggregate-root that is pretty essential for the whole application (based on their IDs, so there are no hard references - also the deletion is based on events/eventual consistency). Now when we create a new Entity "Entity1" we send a new CreateEntity1Command that

Aggregate raster in R with NA values

阅读更多关于 Aggregate raster in R with NA values

问题 I have a 1km resolution raster in R with widespread NA values throughout, but at irregular locations (i.e. the cells with data are not contiguous and have NA values scattered throughout). I am trying to aggregate this raster (using aggregate() command in the {raster} package) at, say, 5km resolution (factor=5) with a user-defined function for averaging circular angles (included below). As of now, I can't figure how to get aggregate() (or my function, if that's the problem) to provide a result

Finding the maximum of minimum values

阅读更多关于 Finding the maximum of minimum values

问题 I would like to calculate the maximum value of the minimum values of each row in a spreadsheet (Google Sheets, specifically) that is greater than 0. I hope that makes sense. My data is: 0 6 7 8 1 0 12 21 22 21 0 10 18 24 0 7 9 1 17 0 16 16 20 So, I want an ArrayFormula of some sort that will generate: 1 12 10 1 16 Of which I could then get the maximum. I've read and experienced that the obvious solution doesn't work, which is: =max(ArrayFormula(min(if(A:Z>0,A:Z,""))) The reason being the

Calculate means across elements in a list

阅读更多关于 Calculate means across elements in a list

问题 I have a list like this: (mylist <- list(a = data.frame(x = c(1, 2), y = c(3, 4)), b = data.frame(x = c(2, 3), y = c(4, NA)), c = data.frame(x = c(3, 4), y = c(NA, NA)))) $a x y 1 1 3 2 2 4 $b x y 1 2 4 2 3 NA $c x y 1 3 NA 2 4 NA which is created by purrr::map() . How can I calculate the means of values in the corresponding cells? i.e. x y 1 2 3.5 2 3 4 where mean(c(1, 2, 3), na.rm = T) # = 2 mean(c(2, 3, 4), na.rm = T) # = 3 mean(c(3, 4, NA), na.rm = T) # = 3.5 mean(c(4, NA, NA), na.rm = T)

select aggregate function and all other columns

阅读更多关于 select aggregate function and all other columns

问题 How do I select all columns in a table and an aggregate function in a convenient way? I.e. say that I have a table with 100 columns, and I want to send the following SELECT Max(Columns 44), ALL OTHER COLUMNS FROM zz Group by ALL OTHER COLUMNS Thanks! 回答1: To select all columns from the table is: select * from zz; To select a maximum from the table is select max(column44) from zz; The two combined: select zz.*, (select max(column44) from zz) as maxcol44 from zz; If you want to omit column44 in

Aggregating minutes to hour demand

阅读更多关于 Aggregating minutes to hour demand

问题 I don't know if I am in the right section for this question, I've looked around and did not find an answer so here is my question: I have a CSV file ordered as follows: dat <- read.csv(text="Date,Demand 01/01/2012 00:00:00,5061.5 01/01/2012 00:05:00,5030.0 01/01/2012 00:10:00,5011.5 01/01/2012 00:15:00,4983.5 01/01/2012 00:20:00,4963.4 01/01/2012 00:25:00,4980.6 01/01/2012 00:30:00,4969.4 01/01/2012 00:35:00,4961.7 01/01/2012 00:40:00,4929.0 01/01/2012 00:45:00,4907.1 01/01/2012 00:50:00,4892

Pivot using multiple columns

阅读更多关于 Pivot using multiple columns

问题 I have a data set with 5 columns: store_id year event item units 123 2015 sale_2 abc 2 234 2015 sale_3 def 1 345 2015 sale_2 xyz 5 I'm trying to rotate out the items by store_id, year, and event to get the sum . For instance store_id year event abc def xyz 123 2015 sale_2 7 0 0 234 2015 sale_2 2 1 0 I'm having trouble figuring out the best method. Normally I'd use dummyVars in caret to do this but I need sums instead of flag. I've looked at tapply but it can't handle more than 2 grouping

R: Summarize rows per month

阅读更多关于 R: Summarize rows per month

问题 I have made a dataframe which has a column with dates and columns with numeric values. I want this dataframe to group itself by month and summerize all the numeric values from the other columns per corresponding month. Here is my dataframe example: capture.date Test1 Test2 Test3 2016-03-18 0 1 1 2016-03-18 1 1 1 2016-03-20 2 1 1 2016-04-12 1 0 1 I already tried some code: df %>% group_by(capture.date) %>% summarise_each(funs(sum)) and: aggregate(df[2:4], by=df["capture.date"], sum) but both

How to Aggregate Relational Data in Stata?

阅读更多关于 How to Aggregate Relational Data in Stata?

问题 I can't wrap my head around the following Stata programming problem: I have a table listing all car purchases by customers and make: Customer | Make | Price ----------------------- c1 | m1 | 1 c1 | m1 | 2 c1 | m3 | 1 c2 | m2 | 2 c3 | . | . I want to transform this into a table with one observation/row per customer, listing the maximum price paid for every make: Customer | m1 | m2 | m3 ----------------------- c1 | 2 | 0 | 1 c2 | 0 | 1 | 0 c3 | 0 | 0 | 0 How do I achieve this? I know reshape