summarization | 易学教程

Summing Multiple Groups of Columns

阅读更多关于 Summing Multiple Groups of Columns

问题 I have a situation where my data frame contains the results of image analysis where the columns are the proportion of a particular class present in the image, such that an example dataframe class_df would look like: id A B C D E F 1 0.20 0.30 0.10 0.15 0.25 0.00 2 0.05 0.10 0.05 0.30 0.10 0.40 3 0.10 0.10 0.10 0.20 0.20 0.30 Each of these classes belongs to a functional group and I want to create new columns where the proportions of each functional group are calculated from the classes. An

Efficient way to sum measurements / time series by given interval in php

阅读更多关于 Efficient way to sum measurements / time series by given interval in php

问题 I have a series of measurement data / time series in the same interval of 15 minutes. Furthermore, I have a given period (e.g. one day, current week, month, year, (...) and I need to summarize values by hour, day, month, (...). E.g. summarize all values of the last month, by day. My approach is to generate a temporary array with the needed interval per period in the first step. E.g. here in PHP (PHP is not that necessary, I would prefer Python or Javascript if it provides a faster method)

Efficient way to sum measurements / time series by given interval in php

阅读更多关于 Efficient way to sum measurements / time series by given interval in php

I have a series of measurement data / time series in the same interval of 15 minutes. Furthermore, I have a given period (e.g. one day, current week, month, year, (...) and I need to summarize values by hour, day, month, (...). E.g. summarize all values of the last month, by day. My approach is to generate a temporary array with the needed interval per period in the first step. E.g. here in PHP (PHP is not that necessary, I would prefer Python or Javascript if it provides a faster method) $this->tempArray = array( '2014-10-01T00:00:00+0100' => array(), '2014-10-02T00:00:00+0100' => array(),

dplyr standard evaluation: summarise_ with variable name for summed variable

阅读更多关于 dplyr standard evaluation: summarise_ with variable name for summed variable

问题 I went through a lot of questions that are similar to mine but only addressed one part of my problem. I am using dplyr with standard evaluation to accommodate variable names. This works fine for filter_ and group_by_ in a pipe. However, for summarize I cannot have a variable name for the metric I'm summing. An example will make it clear. library(dplyr) library(lazyeval) # create data a <- data.frame( x = c(2010, 2010, 2011, 2011, 2011), y_zm = c(rep(10, 5)), y_r2 = c(rep(20, 5))) # define

How to aggregate count of unique values of categorical variables in R

阅读更多关于 How to aggregate count of unique values of categorical variables in R

问题 Suppose I have a data set data : x1 <- c("a","a","a","a","a","a","b","b","b","b") x2 <- c("a1","a1","a1","a1","a1","a1","b1","b1","b2","b2") data <- data.frame(x1,x2) x1 x2 a a1 a a1 a a2 a a1 a a2 a a3 b b1 b b1 b b2 b b2 I want to find the number of unique values of x1 corresponding to x2 For example a has only 3 unique values ( a1,a2 and a3 ) and b has 2 values ( b1 and b2 ) I used aggregate(x1~.,data,sum) but it did not work since these are factors, not integers. Please help 回答1: Try

MySQL ON DUPLICATE KEY UPDATE with nullable column in unique key

阅读更多关于 MySQL ON DUPLICATE KEY UPDATE with nullable column in unique key

问题 Our MySQL web analytics database contains a summary table which is updated throughout the day as new activity is imported. We use ON DUPLICATE KEY UPDATE in order that the summarization overwrites earlier calculations, but are having difficulty because one of the columns in the summary table's UNIQUE KEY is an optional FK, and contains NULL values. These NULLs are intended to mean "not present, and all such cases are equivalent". Of course, MySQL usually treats NULLs as meaning "unknown, and

Summing Multiple Groups of Columns

阅读更多关于 Summing Multiple Groups of Columns

I have a situation where my data frame contains the results of image analysis where the columns are the proportion of a particular class present in the image, such that an example dataframe class_df would look like: id A B C D E F 1 0.20 0.30 0.10 0.15 0.25 0.00 2 0.05 0.10 0.05 0.30 0.10 0.40 3 0.10 0.10 0.10 0.20 0.20 0.30 Each of these classes belongs to a functional group and I want to create new columns where the proportions of each functional group are calculated from the classes. An example mapping class_fg class fg A Z B Z C Z D Y E Y F X and the desired result would be (line added to

How to aggregate count of unique values of categorical variables in R

阅读更多关于 How to aggregate count of unique values of categorical variables in R

Suppose I have a data set data : x1 <- c("a","a","a","a","a","a","b","b","b","b") x2 <- c("a1","a1","a1","a1","a1","a1","b1","b1","b2","b2") data <- data.frame(x1,x2) x1 x2 a a1 a a1 a a2 a a1 a a2 a a3 b b1 b b1 b b2 b b2 I want to find the number of unique values of x1 corresponding to x2 For example a has only 3 unique values ( a1,a2 and a3 ) and b has 2 values ( b1 and b2 ) I used aggregate(x1~.,data,sum) but it did not work since these are factors, not integers. Please help Try aggregate(x2~x1, data, FUN=function(x) length(unique(x))) # x1 x2 #1 a 3 #2 b 2 Or rowSums(table(unique(data)))

MySQL ON DUPLICATE KEY UPDATE with nullable column in unique key

阅读更多关于 MySQL ON DUPLICATE KEY UPDATE with nullable column in unique key

Our MySQL web analytics database contains a summary table which is updated throughout the day as new activity is imported. We use ON DUPLICATE KEY UPDATE in order that the summarization overwrites earlier calculations, but are having difficulty because one of the columns in the summary table's UNIQUE KEY is an optional FK, and contains NULL values. These NULLs are intended to mean "not present, and all such cases are equivalent". Of course, MySQL usually treats NULLs as meaning "unknown, and all such cases are not equivalent". Basic structure is as follows: An "Activity" table containing an

dplyr idiom for summarize() a filtered-group-by, and also replace any NAs due to missing rows

阅读更多关于 dplyr idiom for summarize() a filtered-group-by, and also replace any NAs due to missing rows

I am computing a dplyr::summarize across a dataframe of sales data. I do a group-by (S,D,Y), then within each group, compute medians and means for weeks 5..43, then merge those back into the parent df. Variable X is sales. X is never NA (i.e. there are no explicit NAs anywhere in df), but if there is no data (as in, no sales) for that S,D,Y and set of weeks, there will simply be no row with those values in df (take it that means zero sales for that particular set of parameters). In other words, impute X=0 in any structurally missing rows (but I hope I don't need to melt/cast the original df,