group-by | 易学教程

MYSQL shows incorrect rows when using GROUP BY

阅读更多关于 MYSQL shows incorrect rows when using GROUP BY

问题 I have two tables: article('id', 'ticket_id', 'incoming_time', 'to', 'from', 'message') ticket('id', 'queue_id') where tickets represent a thread of emails between support staff and customers, and articles are the individual messages that compose a thread. I'm looking to find the article with the highest incoming time (expressed as a unix timestamp) for each ticket_id, and this is the query I'm currently using: SELECT article.* , MAX(article.incoming_time) as maxtime FROM ticket, article

GROUP BY using parameters in SQL

阅读更多关于 GROUP BY using parameters in SQL

问题 I am trying to somehow group a report based on a drop-down list of parameters that is pre-defined. I want to be able to subtotal the Total Hours or Total Pay of my report based on Department or JobCode. I have created the parameters and have no problem with that, I just am not sure if it's possible to use those parameters to call out a grouping command. Below is the spirit of what I am wanting, but the GROUP BY clause doesn't work for me even without a parameter. SELECT EmployeeID, LastName,

Why doesn't first and last in a groupby give me first and last

阅读更多关于 Why doesn't first and last in a groupby give me first and last

问题 I'm posting this because the topic just got brought up in another question/answer and the behavior isn't very well documented. Consider the dataframe df df = pd.DataFrame(dict( A=list('xxxyyy'), B=[np.nan, 1, 2, 3, 4, np.nan] )) A B 0 x NaN 1 x 1.0 2 x 2.0 3 y 3.0 4 y 4.0 5 y NaN I wanted to get the first and last rows of each group defined by column 'A' . I tried df.groupby('A').B.agg(['first', 'last']) first last A x 1.0 2.0 y 3.0 4.0 However, This doesn't give me the np.NaN s that I

Why doesn't first and last in a groupby give me first and last

阅读更多关于 Why doesn't first and last in a groupby give me first and last

pandas divide row value by aggregated sum with a condition set by other cell

阅读更多关于 pandas divide row value by aggregated sum with a condition set by other cell

问题 Hi Hoping to get some help, I have two columns Dataframe df as; Source ID 1 2 2 3 1 2 1 2 1 3 3 1 My intention is to group the Source and divide the ID cell by total based on the grouped Source and attach this to the orginial dataframe so the new column would look like; Source ID ID_new 1 2 2/9 2 3 3/3 1 2 2/9 1 2 2/9 1 3 3/9 3 1 3/1 I've gotten as far as; df.groupby('Source ID')['ID'].sum() to get the total for ID but Im not sure where to go next. 回答1: try this: In [79]: df.assign(ID_new=df

Summarize different Columns with different Functions

阅读更多关于 Summarize different Columns with different Functions

问题 I have the following Problem: In a data frame I have a lot of rows and columns with the first row being the date. For each date I have more than 1 observation and I want to summarize them. My df looks like that (date replaced by ID for ease of use): df: ID Cash Price Weight ... 1 0.4 0 0 1 0.2 0 82 ... 1 0 1 0 ... 1 0 3.2 80 ... 2 0.3 1 70 ... ... ... ... ... ... I want to group them by the first column and then summarize all rows BUT with different functions: The function Cash and Price

Groupby column and find min and max of each group

阅读更多关于 Groupby column and find min and max of each group

问题 I have the following dataset, Day Element Data_Value 6786 01-01 TMAX 112 9333 01-01 TMAX 101 9330 01-01 TMIN 60 11049 01-01 TMIN 0 6834 01-01 TMIN 25 11862 01-01 TMAX 113 1781 01-01 TMAX 115 11042 01-01 TMAX 105 1110 01-01 TMAX 111 651 01-01 TMIN 44 11350 01-01 TMIN 83 1798 01-02 TMAX 70 4975 01-02 TMAX 79 12774 01-02 TMIN 0 3977 01-02 TMIN 60 2485 01-02 TMAX 73 4888 01-02 TMIN 31 11836 01-02 TMIN 26 11368 01-02 TMAX 71 2483 01-02 TMIN 26 I want to group by the Day and then find the overall

LEFT JOIN after GROUP BY?

阅读更多关于 LEFT JOIN after GROUP BY?

问题 I have a table of "Songs", "Songs_Tags" (relating songs with tags) and "Songs_Votes" (relating songs with boolean like/dislike). I need to retrieve the songs with a GROUP_CONCAT() of its tags and also the number of likes (true) and dislikes (false). My query is something like that: SELECT s.*, GROUP_CONCAT(st.id_tag) AS tags_ids, COUNT(CASE WHEN v.vote=1 THEN 1 ELSE NULL END) as votesUp, COUNT(CASE WHEN v.vote=0 THEN 1 ELSE NULL END) as votesDown, FROM Songs s LEFT JOIN Songs_Tags st ON (s.id

How to create Pandas groupby plot with subplots?

阅读更多关于 How to create Pandas groupby plot with subplots?

问题 I have a data frame like this: value identifier 2007-01-01 0.781611 55 2007-01-01 0.766152 56 2007-01-01 0.766152 57 2007-02-01 0.705615 55 2007-02-01 0.032134 56 2007-02-01 0.032134 57 2008-01-01 0.026512 55 2008-01-01 0.993124 56 2008-01-01 0.993124 57 2008-02-01 0.226420 55 2008-02-01 0.033860 56 2008-02-01 0.033860 57 So I do a groupby per identifier: df.groupby('identifier') And now I want to generate subplots in a grid, one plot per group. I tried both df.groupby('identifier').plot

Pandas groupby(),agg() - how to return results without the multi index?

阅读更多关于 Pandas groupby(),agg() - how to return results without the multi index?

问题 I have a dataframe: pe_odds[ [ 'EVENT_ID', 'SELECTION_ID', 'ODDS' ] ] Out[67]: EVENT_ID SELECTION_ID ODDS 0 100429300 5297529 18.00 1 100429300 5297529 20.00 2 100429300 5297529 21.00 3 100429300 5297529 22.00 4 100429300 5297529 23.00 5 100429300 5297529 24.00 6 100429300 5297529 25.00 When I use groupby and agg, I get results with a multi-index: pe_odds.groupby( [ 'EVENT_ID', 'SELECTION_ID' ] )[ 'ODDS' ].agg( [ np.min, np.max ] ) Out[68]: amin amax EVENT_ID SELECTION_ID 100428417 5490293 1