group-by | 易学教程

How can I get a COUNT(col) … GROUP BY to use an index?

阅读更多关于 How can I get a COUNT(col) … GROUP BY to use an index?

问题 I've got a table (col1, col2, ...) with an index on (col1, col2, ...). The table has got millions of rows in it, and I want to run a query: SELECT col1, COUNT(col2) WHERE col1 NOT IN (<couple of exclusions>) GROUP BY col1 Unfortunately, this is resulting in a full table scan of the table, which takes upwards of a minute. Is there any way of getting oracle to use the index on the columns to return the results much faster? EDIT: more specifically, I'm running the following query: SELECT owner,

MySQL Group By custom timestamp

阅读更多关于 MySQL Group By custom timestamp

问题 I want to get results from the database which is grouped by date. The resulting touples should be grouped by date which ranges from 5 am to 5 am. In other words the cutoff should be on 5 am and not midnight 12 am. I can do GROUP BY DAY(timestamp) to group by normal day but what if I have to vary the time and each records should be grouped from 5 am to 5 am ? How should I change the query ? Thanks ? 回答1: Simply subtract 5 hours from each datetime value you have, and then convert it to a date.

Django Models Group By

阅读更多关于 Django Models Group By

问题 I have this simple SQL query - SELECT pid, COUNT(*) AS docs FROM xml_table WHERE suid='2' GROUP BY pid; How do I get this using Django ORM (i.e. django models). Basically I am not getting how to do GROUP BY ? 回答1: XML_table.objects.filter(suid='2').values('pid').annotate(docs=Count('pid')).order_by() Docs 回答2: This works very nicely. from collections import defaultdict count = defaultdict( int ) for doc in XML_Table.objects.filter(suid='2'): count[doc.pid] += 1 It's not SQL. Often it's faster

Php Arrays Sum Group By [duplicate]

阅读更多关于 Php Arrays Sum Group By [duplicate]

问题 This question already has answers here : Closed 7 years ago . Possible Duplicate: php group by SUM using multi dimensional array I'm working on creating shopping-cart for a wholesale company. And i'll create also invoice later. It's same logic. Firstly, i multiply paket (package) * paket adeti (package quantity) * fiyatı (price) And i wrote it at the end of list. Now i have to calculate vat. The main problem is we don't know vat ratios exactly before. May be there exist 3 different vat ratios

How to get average values for time intervals in Postgres

阅读更多关于 How to get average values for time intervals in Postgres

问题 I'm using PostgreSQL 9.6. I have a table like this: mac sn loc time date vin1 vin2 vin3 1a34 4as11111111 aaaa 7:06:18 1/1/2018 447.42 472.32 682.59 1a34 4as11111111 aaaa 7:06:43 1/1/2018 455.97 476.25 682.59 1a34 4as11111111 aaaa 7:07:35 1/1/2018 470.88 484.2 682.5 I need to calculate the average of the vin1 , vin2 , vin3 within time intervals of 300 sec (5 min). For example, starting from the first time (7:06:18 - 7:11:18), for the dates in range. I can select the data I need with this query

Calculating average time difference among items grouped by a specific column

阅读更多关于 Calculating average time difference among items grouped by a specific column

问题 I have the following dataframe: userid | time 1 22.01.2001 13:00 1 22.01.2001 13:05 1 22.01.2001 13:07 2 22.01.2001 14:00 2 22.01.2001 14:04 2 22.01.2001 13:05 2 22.01.2001 13:06 3 22.01.2001 13:20 3 22.01.2001 13:22 4 22.01.2001 13:37 What I want to obtain is a new column per user that stores the average time difference among the consecutive activities: userid | avg_time_diff 1 3.5 #(5 + 2) / 2 2 2 #(4 + 1 + 1) / 3 3 2 4 0 To achieve this, do I need to loop trough each user and calculate the

Having Trouble with multiple “groupby” with a variable and a category (binned data)

阅读更多关于 Having Trouble with multiple “groupby” with a variable and a category (binned data)

问题 df.dtypes Close float64 eqId int64 date object IntDate int64 expiry int64 delta int64 ivMid float64 conf float64 Skew float64 psc float64 vol_B category dtype: object gb = df.groupby([df['vol_B'],df['expiry']]) gb.describe() I get a long error message with the final line being AttributeError: 'Categorical' object has no attribute 'flags' When I perform a groupby on each of them separately they each (independently) work great, I just can not perform multiple groupby with one of the variables

How to do a groupby on an empty set of columns in Pandas?

阅读更多关于 How to do a groupby on an empty set of columns in Pandas?

问题 I am hitting on a corner case in pandas. I am trying to use the agg fn but without doing a groupby. Say I want an aggregation on the entire dataframe , i.e. from pandas import * DF = DataFrame( randn(5,3), index = list( "ABCDE"), columns = list("abc") ) DF.groupby([]).agg({'a' : np.sum, 'b' : np.mean } ) # <--- does not work And DF.agg( {'a' ... } ) does not work either. My workaround is to do DF['Total'] = 'Total' then do a DF.groupby(['Total']) but this seems a bit artificial. Has anyone

using groupby on pandas dataframe to group by financial year

阅读更多关于 using groupby on pandas dataframe to group by financial year

问题 I have a dataframe with a datetime64 column called DT. Is it possible to use groupby to group by financial year from April 1 to March 31? For example, Date | PE_LOW 2010-04-01 | 15.44 ... 2011-03-31 | 16.8 2011-04-02 | 17. ... 2012-03-31 | 17.4 For the above data, I want to group by Fiscal Year 2010-2011 and Fiscal Year 2011-2012 without creating an extra column.* 回答1: With pandas.DatetimeIndex, that is very simple: DT.groupby(pd.DatetimeIndex(DT.Date).shift(-3,freq='m').year) Or if you use

postgres: get top n occurrences of a value within each group

阅读更多关于 postgres: get top n occurrences of a value within each group

问题 I have a simple table like this: user letter -------------- 1 A 1 A 1 B 1 B 1 B 1 C 2 A 2 B 2 B 2 C 2 C 2 C I want to get the top 2 occurrences of 'letter' per user, like so user letter rank(within user group) -------------------- 1 B 1 1 A 2 2 C 1 2 B 2 or even better: collapsed into columns user 1st-most-occurrence 2nd-most-occurrence 1 B A 2 C B How can I accomplish this in postgres? 回答1: with cte as ( select t.user_id, t.letter, row_number() over(partition by t.user_id order by count(*)