group-by

pandas groupby multiple functions

大城市里の小女人 提交于 2020-01-11 05:45:09
问题 I want summarize the integer_transaction by EMP_NAME . why does my first command fail? How to modify it in case of the second command how to avoid the warning? Is there any way to put EMP_NAME in a column instead of the index I want output Emp_name Count Sum a 2 1 b 1 0 import pandas as pd import numpy as np df = pd.DataFrame(data = {'EMP_NAME': ["a", "a", "b"], 'integer_transaction': [0, 1, 0]}) x=df.groupby(['EMP_NAME'])['integer_transaction'].agg({'Frequency_count': count, 'Frequency_Sum':

Getting data from first and last row of each group

故事扮演 提交于 2020-01-11 04:52:09
问题 I've found many similar topics to this but none I can understand well enough to solve my specific case. A have a table with the following basic structure: +------------------------+ | id | session ID | bal | +------------------------+ | 0 | 00000002 | 100 | | 1 | 00000002 | 120 | | 2 | 00000002 | 140 | | 3 | 00000001 | 900 | | 4 | 00000001 | 800 | | 5 | 00000001 | 500 | +------------------------+ I need to create a (Microsoft SQL) query which returns each unique sessionID along with the first

Using LINQ to group a list of strings based on known substrings that they will contain

我的未来我决定 提交于 2020-01-10 20:12:34
问题 I have a known list of strings like the following: List<string> groupNames = new List<string>(){"Group1","Group2","Group3"}; I also have a list of strings that is not known in advance that will be something like this: List<string> dataList = new List<string>() { "Group1.SomeOtherText", "Group1.SomeOtherText2", "Group3.MoreText", "Group2.EvenMoreText" }; I want to do a LINQ statement that will take the dataList and convert it into either an anonymous object or a dictionary that has a Key of

Cannot use group by and over(partition by) in the same query?

血红的双手。 提交于 2020-01-10 10:05:23
问题 I have a table myTable with 3 columns. col_1 is an INTEGER and the other 2 columns are DOUBLE . For example, col_1={1, 2}, col_2={0.1, 0.2, 0.3} . Each element in col_1 is composed of all the values of col_2 and col_2 has repeated values for each element in col_1 . The 3rd column can have any value as shown below: col_1 | col_2 | Value ---------------------- 1 | 0.1 | 1.0 1 | 0.2 | 2.0 1 | 0.2 | 3.0 1 | 0.3 | 4.0 1 | 0.3 | 5.0 2 | 0.1 | 6.0 2 | 0.1 | 7.0 2 | 0.1 | 8.0 2 | 0.2 | 9.0 2 | 0.3 |

Select random row for each group

岁酱吖の 提交于 2020-01-09 10:25:07
问题 I have a table like this ID ATTRIBUTE 1 A 1 A 1 B 1 C 2 B 2 C 2 C 3 A 3 B 3 C I'd like to select just one random attribute for each ID. The result therefore could look like this (although this is just one of many options ATTRIBUTE B C C This is my attempt on this problem SELECT "ATTRIBUTE" FROM ( SELECT "ID", "ATTRIBUTE", row_number() OVER (PARTITION BY "ID" ORDER BY random()) rownum FROM table ) shuffled WHERE rownum = 1 however, I don't know if this is a good solution, as I need to

How to find the record in a table that contains the maximum value?

怎甘沉沦 提交于 2020-01-09 09:38:53
问题 Although this question looks simple, it is kind of tricky. I have a table with the following columns: table A: int ID float value datetime date varchar(50) group I would like to obtain the "ID" and "value" of the records that contain the maximum "date" grouped by the column "group". Something like "what is the newest value for each group?" I can get each group and its maximum date: SELECT group, MAX(date) FROM A GROUP BY group; -- I also need the "ID" and "value" But I would like to have the

How to find the record in a table that contains the maximum value?

自古美人都是妖i 提交于 2020-01-09 09:38:23
问题 Although this question looks simple, it is kind of tricky. I have a table with the following columns: table A: int ID float value datetime date varchar(50) group I would like to obtain the "ID" and "value" of the records that contain the maximum "date" grouped by the column "group". Something like "what is the newest value for each group?" I can get each group and its maximum date: SELECT group, MAX(date) FROM A GROUP BY group; -- I also need the "ID" and "value" But I would like to have the

Calculating percentages with GROUP BY query

让人想犯罪 __ 提交于 2020-01-09 09:10:52
问题 I have a table with 3 columns which looks like this: File User Rating (1-5) ------------------------------ 00001 1 3 00002 1 4 00003 2 2 00004 3 5 00005 4 3 00005 3 2 00006 2 3 Etc. I want to generate a query that outputs the following (for each user and rating, display the number of files as well as percentage of files): User Rating Count Percentage ----------------------------------- 1 1 3 .18 1 2 6 .35 1 3 8 .47 2 5 12 .75 2 3 4 .25 With Postgresql, I know how to create a query that

Multiple Self-Join based on GROUP BY results

牧云@^-^@ 提交于 2020-01-09 08:17:57
问题 I'm attempting to collect details about backup activity from a ProgreSQL DB table on a backup appliance (Avamar). The table has several columns including: client_name, dataset, plugin_name, type, completed_ts, status_code, bytes_modified and more. Simplified example: | session_id | client_name | dataset | plugin_name | type | completed_ts | status_code | bytes_modified | |------------|-------------|---------|---------------------|------------------|----------------------|-------------|-------

Multiple Self-Join based on GROUP BY results

被刻印的时光 ゝ 提交于 2020-01-09 08:17:31
问题 I'm attempting to collect details about backup activity from a ProgreSQL DB table on a backup appliance (Avamar). The table has several columns including: client_name, dataset, plugin_name, type, completed_ts, status_code, bytes_modified and more. Simplified example: | session_id | client_name | dataset | plugin_name | type | completed_ts | status_code | bytes_modified | |------------|-------------|---------|---------------------|------------------|----------------------|-------------|-------