aggregate | 易学教程

Aggregate based on each item in a special character seperated column in Pandas

阅读更多关于 Aggregate based on each item in a special character seperated column in Pandas

问题 I have input data as given below Date Investment Type Medium 1/1/2000 Mutual Fund, Stocks, Fixed Deposit, Real Estate Own, Online,Through Agent 1/2/2000 Mutual Fund, Stocks, Real Estate Own 1/3/2000 Fixed Deposit Online 1/3/2000 Mutual Fund, Fixed Deposit, Real Estate Through Agent 1/2/2000 Stocks Own, Online, Through Agent The input to my function is Medium. It could be a single value of a list. I want to search the data based on Medium input and then aggregate the data as given below. For

Aggregating connected sets of nodes / edges

阅读更多关于 Aggregating connected sets of nodes / edges

问题 I have a connected set of edges with unique nodes. They are connected using a parent node. Consider the following example code and illustration: CREATE TABLE network ( node integer PRIMARY KEY, parent integer REFERENCES network(node), length numeric NOT NULL ); CREATE INDEX ON network (parent); INSERT INTO network (node, parent, length) VALUES (1, NULL, 1.3), (2, 1, 1.2), (3, 2, 0.9), (4, 3, 1.4), (5, 4, 1.6), (6, 2, 1.5), (7, NULL, 1.0); Visually, two groups of edges can be identified. How

What does a “standard formula interface to a data.frame” mean in R?

阅读更多关于 What does a “standard formula interface to a data.frame” mean in R?

问题 The documentation for aggregate states: ‘aggregate.formula’ is a standard formula interface to ‘aggregate.data.frame’. I am new to R, and I don't understand what this means. Please explain! Thanks! Uri 回答1: Jump to the middle of the examples section of help(aggregate) and you will see this: ## Formulas, one ~ one, one ~ many, many ~ one, and many ~ many: aggregate(weight ~ feed, data = chickwts, mean) aggregate(breaks ~ wool + tension, data = warpbreaks, mean) aggregate(cbind(Ozone, Temp) ~

cumsum using ddply

阅读更多关于 cumsum using ddply

问题 I need to use group by in levels with ddply or aggregate if that's easier. I am not really sure how to do this as I need to use cumsum as my aggregate function. This is what my data looks like: level1 level2 hour product A tea 0 7 A tea 1 2 A tea 2 9 A coffee 17 7 A coffee 18 2 A coffee 20 4 B coffee 0 2 B coffee 1 3 B coffee 2 4 B tea 21 3 B tea 22 1 expected output: A tea 0 7 A tea 1 9 A tea 2 18 A coffee 17 7 A coffee 18 9 A coffee 20 13 B coffee 0 2 B coffee 1 5 B coffee 2 9 B tea 21 3 B

In R, how do you classify values in one data frame based on ranges in another data frame?

阅读更多关于 In R, how do you classify values in one data frame based on ranges in another data frame?

问题 In general, how could I classify values in one column of a data frame with respect to factor values in another data frame? For example, given df1 and df2 I would like to generate df3 (or update df1): > df1 NewAge 1 5 2 25 3 18 4 9 5 43 6 15 7 17 > df2 AgeStart AgeEnd AgeType 1 0 10 A 2 10 20 B 3 20 30 A 4 30 40 B 5 40 50 A I want df3 as: NewAge Type 5 A 25 A 18 B 9 A 43 A 15 B 17 B I used cut() to generate intervals df2_cut <- data.frame(NewAge, "AgeRange" = cut(NewAge, breaks=AgeStart, right

Aggregating data by timespan in MySQL

阅读更多关于 Aggregating data by timespan in MySQL

问题 Basically I want is to aggregate some values in a table according to a timespan. What I do is, I take snapshots of a system every 15 minutes and I want to be able to draw some graph over a long period. Since the graphs get really confusing if too many points are shown (besides getting really slow to render) I want to reduce the number of points by aggregating multiple points into a single point by averaging over them. For this I'd have to be able to group by buckets that can be defined by me

How to sum in pandas by unique index in several columns?

阅读更多关于 How to sum in pandas by unique index in several columns?

问题 I have a pandas DataFrame which details online activities in terms of "clicks" during an user session. There are as many as 50,000 unique users, and the dataframe has around 1.5 million samples. Obviously most users have multiple records. The four columns are a unique user id, the date when the user began the service "Registration", the date the user used the service "Session", the total number of clicks. The organization of the dataframe is as follows: User_ID Registration Session clicks

ElasticSearch returning only documents with distinct value

阅读更多关于 ElasticSearch returning only documents with distinct value

问题 Let's say I have this given data { "name" : "ABC", "favorite_cars" : [ "ferrari","toyota" ] }, { "name" : "ABC", "favorite_cars" : [ "ferrari","toyota" ] }, { "name" : "GEORGE", "favorite_cars" : [ "honda","Hyundae" ] } Whenever I query this data when searching for people who's favorite car is toyota, it returns this data { "name" : "ABC", "favorite_cars" : [ "ferrari","toyota" ] }, { "name" : "ABC", "favorite_cars" : [ "ferrari","toyota" ] } the result is Two records of with a name of ABC.

Can Rails' Active Record handle SQL aggregate queries?

阅读更多关于 Can Rails' Active Record handle SQL aggregate queries?

问题 Just started learning active record and am wondering how to best retrieve data from multiple tables where an SQL aggregate query is involved. In the following example (from a medical app) I'm looking for the most recent events of various types for each patient (e.g. last visit, last labtest etc). As you can see from the sql query below I'm looking for the max(date) value from a grouped query. I resorted to find_by_sql to do this - however I'd like to see how to do this without using find_by

Aggregate vs Sum Performance in LINQ

阅读更多关于 Aggregate vs Sum Performance in LINQ

问题 Three different implementations of finding the sum of an IEnumerable < int> source are given below along with the time taken when the source has 10,000 integers. source.Aggregate(0, (result, element) => result + element); takes 3 ms source.Sum(c => c); takes 12 ms source.Sum(); takes 1 ms I am wondering why the second implementation is four times more expensive than the first one. Shouldn't it be same as the third implementation. 回答1: Note: My computer is running .Net 4.5 RC, so it's possible