aggregate | 易学教程

Aggregate with max and factors

阅读更多关于 Aggregate with max and factors

问题 I have a data.frame with columns of factors, on which I want to compute a max (or min, or quantiles). I can't use these functions on factors, but I want to. Here's some example : set.seed(3) df1 <- data.frame(id = rep(1:5,each=2),height=sample(c("low","medium","high"),size = 10,replace=TRUE)) df1$height <- factor(df1$height,c("low","medium","high")) df1$height_num <- as.numeric(df1$height) # > df1 # id height height_num # 1 1 low 1 # 2 1 high 3 # 3 2 medium 2 # 4 2 low 1 # 5 3 medium 2 # 6 3

How to get percentage of counts of a column after groupby in Pandas

阅读更多关于 How to get percentage of counts of a column after groupby in Pandas

问题 I'm trying to get the distribution of grades for each rank for names in a list of data. However, I can't figure out how to get the proportion/percentage of each grade count over its rank group. Here's an example: df.head() name rank grade Bob 1 A Bob 1 A Bob 1 B Bob 1 C Bob 2 B Bob 3 C Joe 1 C Joe 2 B Joe 2 B Joe 3 A Joe 3 B Joe 3 B I use grade_count = df.groupby(['name', 'rank', 'grade']).['grade'].size()) to give me the count of each grade within its (name,rank) group: name rank grade Bob 1

Finding all rows with unique combination of two columns

阅读更多关于 Finding all rows with unique combination of two columns

问题 I have this table messages ; sender_id recipient_id 1 2 1 3 1 3 2 1 3 1 2 3 I wish to select rows such that: Either sender_id or receiver_id = current_user.id . The other field should be unique. I.e. I want to select unique from table where sender_id = 2 or recipient_id = 2 and I need this result: sender_id recipient_id 2 1 2 3 How to do it? Why? Because I wish to build a facebook-like inbox in which sent and received messages are aggregated, and this query is the bottleneck so far. I am

Finding all rows with unique combination of two columns

阅读更多关于 Finding all rows with unique combination of two columns

Conditional merging tables

阅读更多关于 Conditional merging tables

问题 I have 2tables: Time X1 8/1/2013 56 9/1/2013 14 10/1/2013 8 11/1/2013 4 12/1/2013 78 Time X2 8/1/2013 42 9/1/2013 44 10/1/2013 2 11/1/2013 75 12/1/2013 36 How can I merge those 2 table in one table grouping by "Time" but with one condition: the month from first table must match with the following month form the second - like September from first table should match with October from second table. Thank you! 回答1: This is a perfect job for data.table rolling join library(data.table) setkey(setDT

R - Specifying a desired row order for the output data.frame of aggregate()

阅读更多关于 R - Specifying a desired row order for the output data.frame of aggregate()

问题 I aggregate() the value column sums per site levels of the R data.frame given below: set.seed(2013) df <- data.frame(site = sample(c("A","B","C"), 10, replace = TRUE), currency = sample(c("USD", "EUR", "GBP", "CNY", "CHF"),10, replace=TRUE, prob=c(10,6,5,6,0.5)), value = sample(seq(1:10)/10,10,replace=FALSE)) df.site.sums <- aggregate(value ~ site, data=df, FUN=sum) df.site.sums # site value #1 A 0.2 #2 B 0.6 #3 C 4.7 However, I would like to be able to specify the row order of the resulting

Customizing rolling_apply function in Python pandas

阅读更多关于 Customizing rolling_apply function in Python pandas

问题 Setup I have a DataFrame with three columns: "Category" contains True and False, and I have done df.groupby('Category') to group by these values. "Time" contains timestamps (measured in seconds) at which values have been recorded "Value" contains the values themselves. At each time instance, two values are recorded: one has category "True", and the other has category "False". Rolling apply question Within each category group , I want to compute a number and store it in column Result for each

How can I improve this SQL query?

阅读更多关于 How can I improve this SQL query?

问题 I ran into an interesting SQL problem today and while I came up with a solution that works I doubt it's the best or most efficient answer. I defer to the experts here - help me learn something and improve my query! RDBMS is SQL Server 2008 R2, query is part of an SSRS report that will run against about 100,000 rows. Essentially I have a list of IDs that could have multiple values associated with them, the values being Yes, No, or some other string. For ID x, if any of the values are a Yes, x

How can I improve this SQL query?

阅读更多关于 How can I improve this SQL query?

Aggregating table() over multiple columns in R without a “by” breakdown

阅读更多关于 Aggregating table() over multiple columns in R without a “by” breakdown

问题 I have a 2-column data frame of x- and y-coordinates of points. I want to generate a table of the number of occurrences of each point. Using the table() command produces a table for all possible x-y pairs. I can eliminate the extras with fullTable <- table(coords) smalLTable <- subset(fullTable, fullTable > 0) And then I'm sure I could do a little something with dimnames(fullTable) to get the appropriate coordinates, but is there a better way? Something built in? Something that with coords <-