aggregate | 易学教程

How does a site like kayak.com aggregate content? [closed]

阅读更多关于 How does a site like kayak.com aggregate content? [closed]

问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 3 years ago . Greetings, I've been toying with an idea for a new project and was wondering if anyone has any idea on how a service like Kayak.com is able to aggregate data from so many sources so quickly and accurately. More specifically, do you think Kayak.com is interacting with APIs or are

Aggregate by multiple columns and reshape from long to wide

阅读更多关于 Aggregate by multiple columns and reshape from long to wide

问题 There are some questions similar to this topic on SO but not exactly like my usecase. I have a dataset where the columns are laid out as shown below Id Description Value 10 Cat 19 10 Cat 20 10 Cat 5 10 Cat 13 11 Cat 17 11 Cat 23 11 Cat 7 11 Cat 14 10 Dog 19 10 Dog 20 10 Dog 5 10 Dog 13 11 Dog 17 11 Dog 23 11 Dog 7 11 Dog 14 What I am trying to do is capture the mean of the Value column by Id, Description. The final dataset would look like this. Id Cat Dog 10 14.25 28.5 11 15.25 15.25 I can do

R: calculate the number of occurrences of a specific event in a specified time future

阅读更多关于 R: calculate the number of occurrences of a specific event in a specified time future

问题 my simplified data looks like this: set.seed(1453); x = sample(0:1, 10, TRUE) date = c('2016-01-01', '2016-01-05', '2016-01-07', '2016-01-12', '2016-01-16', '2016-01-20', '2016-01-20', '2016-01-25', '2016-01-26', '2016-01-31') df = data.frame(x, date = as.Date(date)) df x date 1 2016-01-01 0 2016-01-05 1 2016-01-07 0 2016-01-12 0 2016-01-16 1 2016-01-20 1 2016-01-20 0 2016-01-25 0 2016-01-26 1 2016-01-31 I'd like to calculate the number of occurrences for x == 1 within a specified time period

R: How to get the last element from each group?

阅读更多关于 R: How to get the last element from each group?

问题 I have a data frame containing a time series with two time stamp columns, d$day and d$time , and say, for simplicity, one measured variable d$val1 . Suppose I want to examine the situation at the close of each day's experiment, i.e. the last measurement, if it exists. (Not every day has a measurement, and measurements can be taken at different times each day.) I would like to be able to aggregate by day and use some sort of last() or tail() function on time to pull back the corresponding val

How to get last data for each id/date?

阅读更多关于 How to get last data for each id/date?

问题 I have a data frame that contains id, POSIXct(Date & Time) > myData Tpt_ID Tpt_DateTime Value 1 1 2013-01-01 15:17:21 CST 10 2 2 2013-01-01 15:18:32 CST 5 3 3 2013-01-01 16:00:02 CST 1 4 1 2013-01-02 15:10:11 CST 15 5 2 2013-02-02 11:18:32 CST 6 6 3 2013-02-03 12:00:02 CST 2 7 1 2013-01-01 19:17:21 CST 21 8 2 2013-02-02 20:18:32 CST 8 9 3 2013-02-03 22:00:02 CST 3 I'd like to get last Value for each Date and ID For example, Tpt_ID Tpt_DateTime Value 2 2013-01-01 15:18:32 CST 5 3 2013-01-01 16

Fastest way to count occurrences of each unique element

阅读更多关于 Fastest way to count occurrences of each unique element

问题 What is the fastest way to compute the number of occurrences for each unique element in a vector in R? So far, I've tried the following five functions: f1 <- function(x) { aggregate(x, by=list(x), FUN=length) } f2 <- function(x) { r <- rle(x) aggregate(r$lengths, by=list(r$values), FUN=sum) } f3 <- function(x) { u <- unique(x) data.frame(Group=u, Counts=vapply(u, function(y)sum(x==y), numeric(1))) } f4 <- function(x) { r <- rle(x) u <- unique(r$values) data.frame(Group=u, Counts=vapply(u,

设计模式之迭代器模式

阅读更多关于设计模式之迭代器模式

迭代器模式定义迭代器模式（Iterator），提供一种方法顺序访问一个聚合对象中的各种元素，而又不暴露该对象的内部表示。 Java 开发过程中遍历是常用的。如下边程序 for ( int i = 0 ; i < arr . length ; i ++ ) { System . out . println ( arr [ i ] ) ; } for语句中i++每次循环自增1，迭代到下一元素。将循环变量的作用抽象化，通用化后形成的模式，在设计模式中成为Iterator模式。实现场景将书（Book）放到书架（BookShelf）中，并将书名按顺序显示 UMl 名字说明 Aggregate 标识集合的接口 Iterator 遍历集合的接口 Book 表示书的类 BookShelf 表示书架的类 BookShelfIterator 遍历书架的类 Main 测试类程序示例 Aggregate 接口所要便利的集合的接口。实现了该接口的类将成为一个可以保存多个元素的集合，类似数组。 public interface Aggregate { public abstract Iterator iterator ( ) ; } Aggregate接口中声明的方法为iterator,作用为生成一个用于遍历的迭代器。 Iterator 接口作用为遍历集合中元素，相当于循环语句中的循环变量

Returning first row of group

阅读更多关于 Returning first row of group

问题 I have a dataframe consisting of an ID , that is the same for each element in a group, two datetimes and the time interval between these two. One of the datetime objects is my relevant time marker. Now I like to get a subset of the dataframe that consists of the earliest entry for each group. The entries (especially the time interval) need to stay untouched. My first approach was to sort the frame according to 1. ID and 2. relevant datetime. However, I wasn't able to return the first entry

Apply several summary functions on several variables by group in one call

阅读更多关于 Apply several summary functions on several variables by group in one call

问题 I have the following data frame x <- read.table(text = " id1 id2 val1 val2 1 a x 1 9 2 a x 2 4 3 a y 3 5 4 a y 4 9 5 b x 1 7 6 b y 4 4 7 b x 3 9 8 b y 2 8", header = TRUE) I want to calculate the mean of val1 and val2 grouped by id1 and id2, and simultaneously count the number of rows for each id1-id2 combination. I can perform each calculation separately: # calculate mean aggregate(. ~ id1 + id2, data = x, FUN = mean) # count rows aggregate(. ~ id1 + id2, data = x, FUN = length) In order to

Apply several summary functions on several variables by group in one call

阅读更多关于 Apply several summary functions on several variables by group in one call