aggregate | 易学教程

Select row prior to first occurrence of an event by group

阅读更多关于 Select row prior to first occurrence of an event by group

问题 I have a series of observations that describe if and when an animal is spotted in a specific area. The following sample table identifies when a certain animal is seen ( status == 1 ) or not ( status == 0 ) by day. id date status 1 1 2014-06-20 1 2 1 2014-06-21 1 3 1 2014-06-22 1 4 1 2014-06-23 1 5 1 2014-06-24 0 6 2 2014-06-20 1 7 2 2014-06-21 1 8 2 2014-06-22 0 9 2 2014-06-23 1 10 2 2014-06-24 1 11 3 2014-06-20 1 12 3 2014-06-21 1 13 3 2014-06-22 0 14 3 2014-06-23 1 15 3 2014-06-24 0 16 4

add rows in a data.table but not when certain columns take same values

阅读更多关于 add rows in a data.table but not when certain columns take same values

问题 I have a data.table dat with 4 columns, say ( col1 , col2 , col3 , col4 ). Input data: structure(list(col1 = c(5.1, 5.1, 4.7, 4.6, 5, 5.1, 5.1, 4.7, 4.6, 5), col2 = c(3.5, 3.5, 3.2, 3.1, 3.6, 3.5, 3.5, 3.2, 3.1, 3.6), col3 = c(1.4, 1.4, 1.3, 1.5, 1.4, 3.4, 3.4, 1.3, 1.5, 1.4 ), col4 = structure(c(1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L), .Label = c("setosa", "versicolor", "virginica", "eer"), class = "factor")), .Names = c("col1", "col2", "col3", "col4"), row.names = c(NA, -10L), class = c(

Aggregate a data frame in R by equally spaced time intervals

阅读更多关于 Aggregate a data frame in R by equally spaced time intervals

问题 I want to aggregate the data by time and create equally spaced time intervals: date<- c(as.POSIXct("2011-08-08 21:00:00"), as.POSIXct("2011-08-08 21:26:00")) value<-c(1,2) dt<-data.frame(date, value) DT<-aggregate(cbind(dt$value),list(cut(dt$date, breaks="10 min")),sum) dt: 2011-08-08 21:00:00 1 2011-08-08 21:26:00 2 DT: 2011-08-08 21:00:00 1 2011-08-08 21:20:00 2 What I want: 2011-08-08 21:00:00 1 2011-08-08 21:10:00 NA 2011-08-08 21:20:00 2 Is there anyway to do this without using zoo or

Error when calculating a running total (cumulative over the previous periods)

阅读更多关于 Error when calculating a running total (cumulative over the previous periods)

问题 I have a table, let's call it My_Table that has a Created datetime column (in SQL Server) that I'm trying to pull a report that shows historically how many rows were to My_Table by month over a particular time. Now I know that I can show how many were added each month with: SELECT YEAR(MT.Created), MONTH(MT.Created), COUNT(*) AS [Total Added] FROM My_Table MT GROUP BY YEAR(MT.Created), MONTH(MT.Created) ORDER BY YEAR(MT.Created), MONTH(MT.Created) Which would return something like: YEAR MONTH

R: “Binning” categorical variables

阅读更多关于 R: “Binning” categorical variables

问题 I have a data.frame which has 13 columns with factors. One of the columns contains credit rating data and has 54 different values: levels(TR_factor$crclscod) [1] "A" "A2" "AA" "B" "B2" "BA" "C" "C2" "C5" "CA" "CC" "CY" "D" [14] "D2" "D4" "D5" "DA" "E" "E2" "E4" "EA" "EC" "EF" "EM" "G" "GA" [27] "GY" "H" "I" "IF" "J" "JF" "K" "L" "M" "O" "P1" "TP" "U" [40] "U1" "V" "V1" "W" "Y" "Z" "Z1" "Z2" "Z4" "Z5" "ZA" "ZY" What I want is to "bin" those categories into something like levels(TR_factor

Update statement containing aggregate not working in SQL server

阅读更多关于 Update statement containing aggregate not working in SQL server

问题 I am hoping someone can help my syntax here. I have two tables ansicache..encounters and ansicache..x_refclaim_Table The encounters table has an encounter column that matches the patacctnumber column in the x_refclaim_table . However, sometimes the patacctnumber can show up twice in the x_refclaim_table with different service dates (column iar_servicedate ). I am trying to update the encounters table, admitted column to the maximum value of the iar_servicedate where the encounter in

Adding a non-aggregated column to an aggregated data set based on the aggregation of another column

阅读更多关于 Adding a non-aggregated column to an aggregated data set based on the aggregation of another column

问题 Is it possible to use the aggregate function to add another column from the original data frame, without actually using that column to aggregate the data? This is a very simplied version of data that will help illustrate my question (let's call it data) name result.1 result.2 replicate day data.for.mean "obj.1" 1 "good" 1 1 5 "obj.1" 1 "good" 2 1 7 "obj.1" 1 "great" 1 2 6 "obj.1" 1 "good" 2 2 9 "obj.1" 2 "bad" 1 1 10 "obj.1" 2 "not good" 2 1 6 "obj.1" 2 "bad" 1 2 5 "obj.1" 2 "not good" 2 2 3

Arity of aggregate in logarithmic time

阅读更多关于 Arity of aggregate in logarithmic time

问题 How to define arity of an aggregate in logarithmic (at least base two) compilation time (strictly speaking, in logarithmic number of instantiations)? What I can do currently is to achieve desired in a linear time: #include <type_traits> #include <utility> struct filler { template< typename type > operator type (); }; template< typename A, typename index_sequence = std::index_sequence<>, typename = void > struct aggregate_arity : index_sequence { }; template< typename A, std::size_t ...indices

aggregate with empty factor but keep row

阅读更多关于 aggregate with empty factor but keep row

问题 I had a similar questions with by() where I accepted the fact that I had to manually replace the resulting NAs. Now I would like to aggregate my data.frame and keep the structure. e.g. My larger data set has factors for 100 countries * 10 years * 5 segments, so it should reduce to 5000 rows. But sometimes some of the segment factors are empty and i only get <5000 rows. I cannot get my head around it... My MWE still applies: #All 3 categories are used df1<-data.frame( val=rep(seq(1:4),3),

R aggregate gives differently structured results using subsets from the same data

阅读更多关于 R aggregate gives differently structured results using subsets from the same data

问题 I'm making diurnal cycles of windspeed based on a dataframe (ball) of several year's hourly data. I want to plot them by season, so I subset out the dates I need and join them like this: b8 = subset(ball, as.Date(date)>="2008-09-01 00:00:00, GMT" & as.Date(date)<= "2008-11-30 23:00:00, GMT" ) b9 = subset(ball, as.Date(date)>="2009-09-01 00:00:00, GMT" & as.Date(date)<= "2009-11-30 23:00:00, GMT" ) b10 = subset(ball, as.Date(date)>="2010-09-01 00:00:00, GMT" & as.Date(date)<= "2010-11-30 23:00