aggregate-functions | 易学教程

How to calculate Mean by Date Grouped as Fiscal Quarters

阅读更多关于 How to calculate Mean by Date Grouped as Fiscal Quarters

I have the following table: Date Country Class Value 6/1/2010 USA A 45 6/1/2010 Canada A 23 6/1/2010 Brazil B 65 9/1/2010 USA B 47 9/1/2010 Canada A 98 9/1/2010 Brazil B 25 12/1/2010 USA B 14 12/1/2010 Canada A 79 12/1/2010 Brazil A 23 3/1/2011 USA A 84 3/1/2011 Canada B 77 3/1/2011 Brazil A 43 6/1/2011 USA A 45 6/1/2011 Canada A 23 6/1/2011 Brazil B 65 9/1/2011 USA B 47 9/1/2011 Canada A 98 9/1/2011 Brazil B 25 12/1/2011 USA B 14 12/1/2011 Canada A 79 12/1/2011 Brazil A 23 3/1/2012 USA A 84 3/1/2012 Canada B 77 3/1/2012 Brazil A 43 In column "Date" years are divided by the following months -

How to use dplyr as alternative to aggregate

阅读更多关于 How to use dplyr as alternative to aggregate

问题 I have a dataframe times that looks like this: user time A 7/7/2010 B 7/12/2010 C 7/12/2010 A 7/12/2010 C 7/15/2010 I'm using aggregate(time ~ user, times, function(x) sort(as.vector(x))) to get this: user time A c(7/7/2010, 7/12/2010) B c(7/12/2010) C c(7/12/2010, 7/15/2010) The problem is that I have over 20 million entries in times so aggregate is taking a over 4 hours. Is there any alternative using dplyr that will get me the sorted vector of dates? 回答1: Updated Answer: Based on your

Sybase ASE 15 Aggregate Function for strings

阅读更多关于 Sybase ASE 15 Aggregate Function for strings

问题 I'am findind a way to aggregate strings from differents rows into a single row in sybase ASE 15. Like this: id | Name Result: id | Names -- - ---- -- - ----- 1 | Matt 1 | Matt, Rocks 1 | Rocks 2 | Stylus 2 | Stylus Something like FOR XML PATH in T-SQL. Thanks! 回答1: Sybase ASE does not have any string aggregate functions like list() or group_concat() ; and while there is some support for FOR XML , it does not include support for the PATH option/feature. Assuming you could have an unknown

Slow LEFT JOIN on CTE with time intervals

阅读更多关于 Slow LEFT JOIN on CTE with time intervals

问题 I am trying to debug a query in PostgreSQL that I've built to bucket market data in time buckets in arbitrary time intervals . Here is my table definition: CREATE TABLE historical_ohlcv ( exchange_symbol TEXT NOT NULL, symbol_id TEXT NOT NULL, kafka_key TEXT NOT NULL, open NUMERIC, high NUMERIC, low NUMERIC, close NUMERIC, volume NUMERIC, time_open TIMESTAMP WITH TIME ZONE NOT NULL, time_close TIMESTAMP WITH TIME ZONE, CONSTRAINT historical_ohlcv_pkey PRIMARY KEY (exchange_symbol, symbol_id,

Pad arrays with NULL to maximum length for custom aggregate function

阅读更多关于 Pad arrays with NULL to maximum length for custom aggregate function

From the answer of the question How to use array_agg() for varchar[] , We can create a custom aggregate function to aggregate n-dimensional arrays in Postgres like: CREATE AGGREGATE array_agg_mult (anyarray) ( SFUNC = array_cat ,STYPE = anyarray ,INITCOND = '{}' ); A constrain is that the values have to share the same array extents and same length , handling empty values and different lengths doesn't work. From the answer: There is no way around that, the array type does not allow such a mismatch in Postgres. You could pad your arrays with NULL values so that all dimensions have matching

Custom PostgreSQL aggregate for circular average

阅读更多关于 Custom PostgreSQL aggregate for circular average

问题 I'm trying to implement a custom aggregate function in Postgres which will average directions in degrees - i.e. I want to able to do: SELECT circavg(direction) FROM sometable; This can be done using the formula: xbar = atan2(sum(sin(xi), sum(cos(xi))) I think I need to define an sfunc which will take a direction, and add the sine and cosine of that into two accumulators. The final function then converts the two components back into a direction using atan2. I can't work out how to define the

get the SUM of each Person by the PersonID

阅读更多关于 get the SUM of each Person by the PersonID

I have the following columns in a table: SCORE_ID SCORE_PERSON_ID SCORE_VOTE The SCORE_PERSON_ID is a variable. I need to sum the SCORE_VOTE per SCORE_PERSON_ID. Can you suggest of a good way to do that? You need a GROUP BY and an aggregate function like count or sum SELECT SCORE_PERSON_ID, sum(SCORE_VOTE) as score FROM table GROUP BY `SCORE_PERSON_ID` SELECT SUM(SCORE_VOTE) FROM SCORES GROUP BY SCORE_PERSON_ID how about select sum(SCORE_VOTE) as score from TABLE group by SCORE_PERSON_ID this is sum them up for each person select sum(SCORE_VOTE) as score from TABLE where SCORE_PERSON_ID = 1 I

How to apply a shapiro test by groups in R?

阅读更多关于 How to apply a shapiro test by groups in R?

I have a dataframe where all my 90 variables have integer data, of the type: code | variable1 | variable2 | variable3 | ... AB | 2 | 3 | 10 | ... AH | 4 | 6 | 8 | ... BC | 1 | 5 | 9 | ... ... | ... | ... | ... I want to apply a shapiro test (shapiro.test {stats}) to my dataframe by variable and write the results in a table like: variable_name | W | p-value Does anyone have a clue? Using mtcars data from R mydata<-mtcars kk<-Map(function(x)cbind(shapiro.test(x)$statistic,shapiro.test(x)$p.value),mydata) library(plyr) myout<-ldply(kk) names(myout)<-c("var","W","p.value") myout var W p.value 1

Is it possible to have an SQL query that uses AGG functions in this way?

阅读更多关于 Is it possible to have an SQL query that uses AGG functions in this way?

问题 Assuming I have the following aggregate functions: AGG1 AGG2 AGG3 AGG4 Is it possible to write valid SQL (in a db agnostic way) like this: SELECT [COL1, COL2 ....], AGG1(param1), AGG2(param2) FROM [SOME TABLES] WHERE [SOME CRITERIA] HAVING AGG3(param2) >-1 and AGG4(param4) < 123 GROUP BY COL1, COL2, ... COLN ORDER BY COL1, COLN ASC LIMIT 10 Where COL1 ... COLN are columns in the tables being queried, and param1 ... paramX are parameters passed to the AGG funcs. Note: AGG1 and AGG2 are

Slow LEFT JOIN on CTE with time intervals

阅读更多关于 Slow LEFT JOIN on CTE with time intervals

I am trying to debug a query in PostgreSQL that I've built to bucket market data in time buckets in arbitrary time intervals . Here is my table definition: CREATE TABLE historical_ohlcv ( exchange_symbol TEXT NOT NULL, symbol_id TEXT NOT NULL, kafka_key TEXT NOT NULL, open NUMERIC, high NUMERIC, low NUMERIC, close NUMERIC, volume NUMERIC, time_open TIMESTAMP WITH TIME ZONE NOT NULL, time_close TIMESTAMP WITH TIME ZONE, CONSTRAINT historical_ohlcv_pkey PRIMARY KEY (exchange_symbol, symbol_id, time_open) ); CREATE INDEX symbol_id_idx ON historical_ohlcv (symbol_id); CREATE INDEX open_close