group-by | 易学教程

pandas groupby multiple functions

阅读更多关于 pandas groupby multiple functions

问题 I want summarize the integer_transaction by EMP_NAME . why does my first command fail? How to modify it in case of the second command how to avoid the warning? Is there any way to put EMP_NAME in a column instead of the index I want output Emp_name Count Sum a 2 1 b 1 0 import pandas as pd import numpy as np df = pd.DataFrame(data = {'EMP_NAME': ["a", "a", "b"], 'integer_transaction': [0, 1, 0]}) x=df.groupby(['EMP_NAME'])['integer_transaction'].agg({'Frequency_count': count, 'Frequency_Sum':

Getting data from first and last row of each group

阅读更多关于 Getting data from first and last row of each group

问题 I've found many similar topics to this but none I can understand well enough to solve my specific case. A have a table with the following basic structure: +------------------------+ | id | session ID | bal | +------------------------+ | 0 | 00000002 | 100 | | 1 | 00000002 | 120 | | 2 | 00000002 | 140 | | 3 | 00000001 | 900 | | 4 | 00000001 | 800 | | 5 | 00000001 | 500 | +------------------------+ I need to create a (Microsoft SQL) query which returns each unique sessionID along with the first

Using LINQ to group a list of strings based on known substrings that they will contain

阅读更多关于 Using LINQ to group a list of strings based on known substrings that they will contain

问题 I have a known list of strings like the following: List<string> groupNames = new List<string>(){"Group1","Group2","Group3"}; I also have a list of strings that is not known in advance that will be something like this: List<string> dataList = new List<string>() { "Group1.SomeOtherText", "Group1.SomeOtherText2", "Group3.MoreText", "Group2.EvenMoreText" }; I want to do a LINQ statement that will take the dataList and convert it into either an anonymous object or a dictionary that has a Key of

Cannot use group by and over(partition by) in the same query?

阅读更多关于 Cannot use group by and over(partition by) in the same query?

问题 I have a table myTable with 3 columns. col_1 is an INTEGER and the other 2 columns are DOUBLE . For example, col_1={1, 2}, col_2={0.1, 0.2, 0.3} . Each element in col_1 is composed of all the values of col_2 and col_2 has repeated values for each element in col_1 . The 3rd column can have any value as shown below: col_1 | col_2 | Value ---------------------- 1 | 0.1 | 1.0 1 | 0.2 | 2.0 1 | 0.2 | 3.0 1 | 0.3 | 4.0 1 | 0.3 | 5.0 2 | 0.1 | 6.0 2 | 0.1 | 7.0 2 | 0.1 | 8.0 2 | 0.2 | 9.0 2 | 0.3 |

Select random row for each group

阅读更多关于 Select random row for each group

问题 I have a table like this ID ATTRIBUTE 1 A 1 A 1 B 1 C 2 B 2 C 2 C 3 A 3 B 3 C I'd like to select just one random attribute for each ID. The result therefore could look like this (although this is just one of many options ATTRIBUTE B C C This is my attempt on this problem SELECT "ATTRIBUTE" FROM ( SELECT "ID", "ATTRIBUTE", row_number() OVER (PARTITION BY "ID" ORDER BY random()) rownum FROM table ) shuffled WHERE rownum = 1 however, I don't know if this is a good solution, as I need to

How to find the record in a table that contains the maximum value?

阅读更多关于 How to find the record in a table that contains the maximum value?

问题 Although this question looks simple, it is kind of tricky. I have a table with the following columns: table A: int ID float value datetime date varchar(50) group I would like to obtain the "ID" and "value" of the records that contain the maximum "date" grouped by the column "group". Something like "what is the newest value for each group?" I can get each group and its maximum date: SELECT group, MAX(date) FROM A GROUP BY group; -- I also need the "ID" and "value" But I would like to have the

How to find the record in a table that contains the maximum value?

阅读更多关于 How to find the record in a table that contains the maximum value?

Calculating percentages with GROUP BY query

阅读更多关于 Calculating percentages with GROUP BY query

问题 I have a table with 3 columns which looks like this: File User Rating (1-5) ------------------------------ 00001 1 3 00002 1 4 00003 2 2 00004 3 5 00005 4 3 00005 3 2 00006 2 3 Etc. I want to generate a query that outputs the following (for each user and rating, display the number of files as well as percentage of files): User Rating Count Percentage ----------------------------------- 1 1 3 .18 1 2 6 .35 1 3 8 .47 2 5 12 .75 2 3 4 .25 With Postgresql, I know how to create a query that

Multiple Self-Join based on GROUP BY results

阅读更多关于 Multiple Self-Join based on GROUP BY results

问题 I'm attempting to collect details about backup activity from a ProgreSQL DB table on a backup appliance (Avamar). The table has several columns including: client_name, dataset, plugin_name, type, completed_ts, status_code, bytes_modified and more. Simplified example: | session_id | client_name | dataset | plugin_name | type | completed_ts | status_code | bytes_modified | |------------|-------------|---------|---------------------|------------------|----------------------|-------------|-------

Multiple Self-Join based on GROUP BY results

阅读更多关于 Multiple Self-Join based on GROUP BY results