aggregate-functions

Apply multiple functions to multiple groupby columns

百般思念 提交于 2019-11-26 00:10:21
问题 The docs show how to apply multiple functions on a groupby object at a time using a dict with the output column names as the keys: In [563]: grouped[\'D\'].agg({\'result1\' : np.sum, .....: \'result2\' : np.mean}) .....: Out[563]: result2 result1 A bar -0.579846 -1.739537 foo -0.280588 -1.402938 However, this only works on a Series groupby object. And when a dict is similarly passed to a groupby DataFrame, it expects the keys to be the column names that the function will be applied to. What I

SELECTING with multiple WHERE conditions on same column

…衆ロ難τιáo~ 提交于 2019-11-25 23:40:12
问题 Ok, I think I might be overlooking something obvious/simple here... but I need to write a query that returns only records that match multiple criteria on the same column... My table is a very simple linking setup for applying flags to a user ... ID contactid flag flag_type ----------------------------------- 118 99 Volunteer 1 119 99 Uploaded 2 120 100 Via Import 3 121 100 Volunteer 1 122 100 Uploaded 2 etc... in this case you\'ll see both contact 99 and 100 are flagged as both \"Volunteer\"

Spark SQL: apply aggregate functions to a list of columns

妖精的绣舞 提交于 2019-11-25 23:39:59
问题 Is there a way to apply an aggregate function to all (or a list of) columns of a dataframe, when doing a groupBy ? In other words, is there a way to avoid doing this for every column: df.groupBy(\"col1\") .agg(sum(\"col2\").alias(\"col2\"), sum(\"col3\").alias(\"col3\"), ...) 回答1: There are multiple ways of applying aggregate functions to multiple columns. GroupedData class provides a number of methods for the most common functions, including count , max , min , mean and sum , which can be

Two SQL LEFT JOINS produce incorrect result

喜你入骨 提交于 2019-11-25 23:31:52
问题 I have 3 tables: users(id, account_balance) grocery(user_id, date, amount_paid) fishmarket(user_id, date, amount_paid) Both fishmarket and grocery tables may have multiple occurrences for the same user_id with different dates and amounts paid or have nothing at all for any given user. When I try the following query: SELECT t1.\"id\" AS \"User ID\", t1.account_balance AS \"Account Balance\", count(t2.user_id) AS \"# of grocery visits\", count(t3.user_id) AS \"# of fishmarket visits\" FROM

SQL select only rows with max value on a column [duplicate]

♀尐吖头ヾ 提交于 2019-11-25 22:50:41
问题 This question already has an answer here: Retrieving the last record in each group - MySQL 25 answers I have this table for documents (simplified version here): +------+-------+--------------------------------------+ | id | rev | content | +------+-------+--------------------------------------+ | 1 | 1 | ... | | 2 | 1 | ... | | 1 | 2 | ... | | 1 | 3 | ... | +------+-------+--------------------------------------+ How do I select one row per id and only the greatest rev? With the above data,

Reason for Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause [duplicate]

回眸只為那壹抹淺笑 提交于 2019-11-25 21:59:41
问题 This question already has answers here : Closed 6 years ago . Possible Duplicate: GROUP BY / aggregate function confusion in SQL I got an error - Column \'Employee.EmpID\' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. select loc.LocationID, emp.EmpID from Employee as emp full join Location as loc on emp.LocationID = loc.LocationID group by loc.LocationID This situation fits into the answer given by Bill Karwin. correction for