aggregate

Aggregate with max and factors

ε祈祈猫儿з 提交于 2019-12-22 08:44:14
问题 I have a data.frame with columns of factors, on which I want to compute a max (or min, or quantiles). I can't use these functions on factors, but I want to. Here's some example : set.seed(3) df1 <- data.frame(id = rep(1:5,each=2),height=sample(c("low","medium","high"),size = 10,replace=TRUE)) df1$height <- factor(df1$height,c("low","medium","high")) df1$height_num <- as.numeric(df1$height) # > df1 # id height height_num # 1 1 low 1 # 2 1 high 3 # 3 2 medium 2 # 4 2 low 1 # 5 3 medium 2 # 6 3

How to get percentage of counts of a column after groupby in Pandas

╄→尐↘猪︶ㄣ 提交于 2019-12-22 08:08:23
问题 I'm trying to get the distribution of grades for each rank for names in a list of data. However, I can't figure out how to get the proportion/percentage of each grade count over its rank group. Here's an example: df.head() name rank grade Bob 1 A Bob 1 A Bob 1 B Bob 1 C Bob 2 B Bob 3 C Joe 1 C Joe 2 B Joe 2 B Joe 3 A Joe 3 B Joe 3 B I use grade_count = df.groupby(['name', 'rank', 'grade']).['grade'].size()) to give me the count of each grade within its (name,rank) group: name rank grade Bob 1

Finding all rows with unique combination of two columns

拥有回忆 提交于 2019-12-22 06:03:35
问题 I have this table messages ; sender_id recipient_id 1 2 1 3 1 3 2 1 3 1 2 3 I wish to select rows such that: Either sender_id or receiver_id = current_user.id . The other field should be unique. I.e. I want to select unique from table where sender_id = 2 or recipient_id = 2 and I need this result: sender_id recipient_id 2 1 2 3 How to do it? Why? Because I wish to build a facebook-like inbox in which sent and received messages are aggregated, and this query is the bottleneck so far. I am

Finding all rows with unique combination of two columns

旧时模样 提交于 2019-12-22 06:03:13
问题 I have this table messages ; sender_id recipient_id 1 2 1 3 1 3 2 1 3 1 2 3 I wish to select rows such that: Either sender_id or receiver_id = current_user.id . The other field should be unique. I.e. I want to select unique from table where sender_id = 2 or recipient_id = 2 and I need this result: sender_id recipient_id 2 1 2 3 How to do it? Why? Because I wish to build a facebook-like inbox in which sent and received messages are aggregated, and this query is the bottleneck so far. I am

Conditional merging tables

无人久伴 提交于 2019-12-21 20:45:08
问题 I have 2tables: Time X1 8/1/2013 56 9/1/2013 14 10/1/2013 8 11/1/2013 4 12/1/2013 78 Time X2 8/1/2013 42 9/1/2013 44 10/1/2013 2 11/1/2013 75 12/1/2013 36 How can I merge those 2 table in one table grouping by "Time" but with one condition: the month from first table must match with the following month form the second - like September from first table should match with October from second table. Thank you! 回答1: This is a perfect job for data.table rolling join library(data.table) setkey(setDT

R - Specifying a desired row order for the output data.frame of aggregate()

♀尐吖头ヾ 提交于 2019-12-21 20:17:07
问题 I aggregate() the value column sums per site levels of the R data.frame given below: set.seed(2013) df <- data.frame(site = sample(c("A","B","C"), 10, replace = TRUE), currency = sample(c("USD", "EUR", "GBP", "CNY", "CHF"),10, replace=TRUE, prob=c(10,6,5,6,0.5)), value = sample(seq(1:10)/10,10,replace=FALSE)) df.site.sums <- aggregate(value ~ site, data=df, FUN=sum) df.site.sums # site value #1 A 0.2 #2 B 0.6 #3 C 4.7 However, I would like to be able to specify the row order of the resulting

Customizing rolling_apply function in Python pandas

只愿长相守 提交于 2019-12-21 20:04:03
问题 Setup I have a DataFrame with three columns: "Category" contains True and False, and I have done df.groupby('Category') to group by these values. "Time" contains timestamps (measured in seconds) at which values have been recorded "Value" contains the values themselves. At each time instance, two values are recorded: one has category "True", and the other has category "False". Rolling apply question Within each category group , I want to compute a number and store it in column Result for each

How can I improve this SQL query?

微笑、不失礼 提交于 2019-12-21 18:13:16
问题 I ran into an interesting SQL problem today and while I came up with a solution that works I doubt it's the best or most efficient answer. I defer to the experts here - help me learn something and improve my query! RDBMS is SQL Server 2008 R2, query is part of an SSRS report that will run against about 100,000 rows. Essentially I have a list of IDs that could have multiple values associated with them, the values being Yes, No, or some other string. For ID x, if any of the values are a Yes, x

How can I improve this SQL query?

时光总嘲笑我的痴心妄想 提交于 2019-12-21 18:13:03
问题 I ran into an interesting SQL problem today and while I came up with a solution that works I doubt it's the best or most efficient answer. I defer to the experts here - help me learn something and improve my query! RDBMS is SQL Server 2008 R2, query is part of an SSRS report that will run against about 100,000 rows. Essentially I have a list of IDs that could have multiple values associated with them, the values being Yes, No, or some other string. For ID x, if any of the values are a Yes, x

Aggregating table() over multiple columns in R without a “by” breakdown

流过昼夜 提交于 2019-12-21 17:18:20
问题 I have a 2-column data frame of x- and y-coordinates of points. I want to generate a table of the number of occurrences of each point. Using the table() command produces a table for all possible x-y pairs. I can eliminate the extras with fullTable <- table(coords) smalLTable <- subset(fullTable, fullTable > 0) And then I'm sure I could do a little something with dimnames(fullTable) to get the appropriate coordinates, but is there a better way? Something built in? Something that with coords <-