group-by | 易学教程

Bitwise operation in Group By

阅读更多关于 Bitwise operation in Group By

问题 I must use bitwise operations in a group by query but I didn't found anything. Table: PermissionId, BitMask(BigInt) 1, 4 2, 7 1, 8 1, 5 I want results as: 1, 13 2, 7 How can I write this script in T-SQL as below SELECT PermissionId, BIT_OR(BitMask) FROM table GROUP BY PermissionId 回答1: Your question just became very interesting. Create this function(you may want to reconsider the name) CREATE function f_test ( @param bigint ) returns @t table (value bigint) AS BEGIN ;WITH CTE AS ( SELECT

Comparing values of a certain row with a certain number of previous rows in data.table

阅读更多关于 Comparing values of a certain row with a certain number of previous rows in data.table

问题 This is an extension of this question asked before. In a database containing firm and category values, I want to calculate this: If a firm enters into a new category that it has not been previously engaged in Three(3) previous years (not including the same year), then that entry is labeld as "NEW", otherwise it will be labeld as "OLD". In the following dataset: df <- data.table(year=c(1979,1979,1980,1980,1981,1981,1982,1983,1983,1984,1984), category = c("A","A","B","C","A","D","F","F","C","A"

Efficient way to group indices of the same elements in a list

阅读更多关于 Efficient way to group indices of the same elements in a list

问题 Let's say I have a list that looks like: [1, 2, 2, 5, 8, 3, 3, 9, 0, 1] Now I want to group the indices of the same elements, so the result should look like: [[0, 9], [1, 2], [3], [4], [5, 6], [7], [8]] How do I do this in an efficient way? I try to avoid using loops so any implementations using numpy/pandas functions are great. 回答1: Using pandas GroupBy.apply , this is pretty straightforward—use your data to group on a Series of indices. A nice bonus here is you get to keep the order of your

Comparing value of a certain row with all previous rows in data.table

阅读更多关于 Comparing value of a certain row with all previous rows in data.table

问题 I'm having a dataset containing firms involving in a certain category of products. Dataset looks like this: df <- data.table(year=c(1979,1979,1980,1980,1980,1981,1981,1982,1982,1982,1982), category = c("A","A","B","C","A","D","C","F","F","A","B")) I want to create a new variable as follows: If a firm enters into a new category that it has not been previously engaged in previous years (not the same year) , then that entry is labeld as "NEW", otherwise it will be labeld as "OLD". As such, the

SQL Server: Average counts by hour and day of week

阅读更多关于 SQL Server: Average counts by hour and day of week

问题 Background I have a table set up in a SQL Server environment that contains a log of various activity that I'm tracking. Particular log items use unique codes to categorize what activity is taking place and a datetime field tracks when that activity occurred. Problem I would like to, using either a single query or a stored procedure, get an average of hourly counts of activity, grouped by day of the week. Example: Day | Hour | Average Count ------------------------------- Monday | 8 | 5 Monday

Python pandas unique value ignoring NaN

阅读更多关于 Python pandas unique value ignoring NaN

问题 I want to use unique in groupby aggregation, but I don't want nan in the unique result. An example dataframe: df = pd.DataFrame({'a': [1, 2, 1, 1, pd.np.nan, 3, 3], 'b': [0,0,1,1,1,1,1], 'c': ['foo', pd.np.nan, 'bar', 'foo', 'baz', 'foo', 'bar']}) a b c 0 1.0000 0 foo 1 2.0000 0 NaN 2 1.0000 1 bar 3 1.0000 1 foo 4 nan 1 baz 5 3.0000 1 foo 6 3.0000 1 bar And the groupby : df.groupby('b').agg({'a': ['min', 'max', 'unique'], 'c': ['first', 'last', 'unique']}) It's result is: a c min max unique

How to pivot a dataframe

阅读更多关于 How to pivot a dataframe

问题 What is pivot? How do I pivot? Is this a pivot? Long format to wide format? I've seen a lot of questions that ask about pivot tables. Even if they don't know that they are asking about pivot tables, they usually are. It is virtually impossible to write a canonical question and answer that encompasses all aspects of pivoting.... ... But I'm going to give it a go. The problem with existing questions and answers is that often the question is focused on a nuance that the OP has trouble

pandas groupby & filter on count

阅读更多关于 pandas groupby & filter on count

问题 I want to capture some categorical values with an occurence above a certain threshold: df: ticket_id, category, amount --> some more columns 1020 cat1 1000 1022 cat1 55 1023 cat1 12291 1120 cat2 140 1121 cat3 1250 ^ | | Way more rows with mostly (1020) cat5, (98) cat1, cat3, cat4 and no cat2. >>>> df.groupby('category')['amount'].count() category cat1 100 cat2 1 cat3 6 cat4 2 cat5 1020 I want to get the categories with count > 20 in a list. Currently I'm doing: >>>> t = test.groupby(

How to fill in rows based on event type data

阅读更多关于 How to fill in rows based on event type data

问题 So my table has 2 columns: hour and customerID. Every customer will have 2 rows, one corresponding to hour that he/she came into the store, and one corresponding to hour that he/she left the store. With this data, I want to create a table that has every hour that a customer has been in the store. For example, a customer X entered the store at 1PM and left at 5PM, so there would be 5 rows (1 for each hour) like the screenshot below. Here's my attempt that's now: select hour ,first_value

Pandas: Group by a column that meets a condition

阅读更多关于 Pandas: Group by a column that meets a condition

问题 I have a data set with three colums: rating , breed, and dog. import pandas as pd dogs = {'breed': ['Chihuahua', 'Chihuahua', 'Dalmatian', 'Sphynx'], 'dog': [True, True, True, False], 'rating': [8.0, 9.0, 10.0, 7.0]} df = pd.DataFrame(data=dogs) I would like to calculate the mean rating per breed where dog is True. This would be the expected: breed rating 0 Chihuahua 8.5 1 Dalmatian 10.0 This has been my attempt: df.groupby('breed')['rating'].mean().where(dog == True) And this is the error