summarization

Summing Multiple Groups of Columns

假装没事ソ 提交于 2019-12-10 02:28:54
问题 I have a situation where my data frame contains the results of image analysis where the columns are the proportion of a particular class present in the image, such that an example dataframe class_df would look like: id A B C D E F 1 0.20 0.30 0.10 0.15 0.25 0.00 2 0.05 0.10 0.05 0.30 0.10 0.40 3 0.10 0.10 0.10 0.20 0.20 0.30 Each of these classes belongs to a functional group and I want to create new columns where the proportions of each functional group are calculated from the classes. An

Efficient way to sum measurements / time series by given interval in php

烈酒焚心 提交于 2019-12-08 05:50:44
问题 I have a series of measurement data / time series in the same interval of 15 minutes. Furthermore, I have a given period (e.g. one day, current week, month, year, (...) and I need to summarize values by hour, day, month, (...). E.g. summarize all values of the last month, by day. My approach is to generate a temporary array with the needed interval per period in the first step. E.g. here in PHP (PHP is not that necessary, I would prefer Python or Javascript if it provides a faster method)

Efficient way to sum measurements / time series by given interval in php

雨燕双飞 提交于 2019-12-08 03:09:25
I have a series of measurement data / time series in the same interval of 15 minutes. Furthermore, I have a given period (e.g. one day, current week, month, year, (...) and I need to summarize values by hour, day, month, (...). E.g. summarize all values of the last month, by day. My approach is to generate a temporary array with the needed interval per period in the first step. E.g. here in PHP (PHP is not that necessary, I would prefer Python or Javascript if it provides a faster method) $this->tempArray = array( '2014-10-01T00:00:00+0100' => array(), '2014-10-02T00:00:00+0100' => array(),

dplyr standard evaluation: summarise_ with variable name for summed variable

天涯浪子 提交于 2019-12-07 14:41:12
问题 I went through a lot of questions that are similar to mine but only addressed one part of my problem. I am using dplyr with standard evaluation to accommodate variable names. This works fine for filter_ and group_by_ in a pipe. However, for summarize I cannot have a variable name for the metric I'm summing. An example will make it clear. library(dplyr) library(lazyeval) # create data a <- data.frame( x = c(2010, 2010, 2011, 2011, 2011), y_zm = c(rep(10, 5)), y_r2 = c(rep(20, 5))) # define

How to aggregate count of unique values of categorical variables in R

女生的网名这么多〃 提交于 2019-12-06 05:41:23
问题 Suppose I have a data set data : x1 <- c("a","a","a","a","a","a","b","b","b","b") x2 <- c("a1","a1","a1","a1","a1","a1","b1","b1","b2","b2") data <- data.frame(x1,x2) x1 x2 a a1 a a1 a a2 a a1 a a2 a a3 b b1 b b1 b b2 b b2 I want to find the number of unique values of x1 corresponding to x2 For example a has only 3 unique values ( a1,a2 and a3 ) and b has 2 values ( b1 and b2 ) I used aggregate(x1~.,data,sum) but it did not work since these are factors, not integers. Please help 回答1: Try

MySQL ON DUPLICATE KEY UPDATE with nullable column in unique key

倾然丶 夕夏残阳落幕 提交于 2019-12-05 12:26:03
问题 Our MySQL web analytics database contains a summary table which is updated throughout the day as new activity is imported. We use ON DUPLICATE KEY UPDATE in order that the summarization overwrites earlier calculations, but are having difficulty because one of the columns in the summary table's UNIQUE KEY is an optional FK, and contains NULL values. These NULLs are intended to mean "not present, and all such cases are equivalent". Of course, MySQL usually treats NULLs as meaning "unknown, and

Summing Multiple Groups of Columns

本秂侑毒 提交于 2019-12-05 02:23:07
I have a situation where my data frame contains the results of image analysis where the columns are the proportion of a particular class present in the image, such that an example dataframe class_df would look like: id A B C D E F 1 0.20 0.30 0.10 0.15 0.25 0.00 2 0.05 0.10 0.05 0.30 0.10 0.40 3 0.10 0.10 0.10 0.20 0.20 0.30 Each of these classes belongs to a functional group and I want to create new columns where the proportions of each functional group are calculated from the classes. An example mapping class_fg class fg A Z B Z C Z D Y E Y F X and the desired result would be (line added to

How to aggregate count of unique values of categorical variables in R

北城以北 提交于 2019-12-04 09:57:29
Suppose I have a data set data : x1 <- c("a","a","a","a","a","a","b","b","b","b") x2 <- c("a1","a1","a1","a1","a1","a1","b1","b1","b2","b2") data <- data.frame(x1,x2) x1 x2 a a1 a a1 a a2 a a1 a a2 a a3 b b1 b b1 b b2 b b2 I want to find the number of unique values of x1 corresponding to x2 For example a has only 3 unique values ( a1,a2 and a3 ) and b has 2 values ( b1 and b2 ) I used aggregate(x1~.,data,sum) but it did not work since these are factors, not integers. Please help Try aggregate(x2~x1, data, FUN=function(x) length(unique(x))) # x1 x2 #1 a 3 #2 b 2 Or rowSums(table(unique(data)))

MySQL ON DUPLICATE KEY UPDATE with nullable column in unique key

我们两清 提交于 2019-12-03 23:45:43
Our MySQL web analytics database contains a summary table which is updated throughout the day as new activity is imported. We use ON DUPLICATE KEY UPDATE in order that the summarization overwrites earlier calculations, but are having difficulty because one of the columns in the summary table's UNIQUE KEY is an optional FK, and contains NULL values. These NULLs are intended to mean "not present, and all such cases are equivalent". Of course, MySQL usually treats NULLs as meaning "unknown, and all such cases are not equivalent". Basic structure is as follows: An "Activity" table containing an

dplyr idiom for summarize() a filtered-group-by, and also replace any NAs due to missing rows

柔情痞子 提交于 2019-12-03 16:25:11
I am computing a dplyr::summarize across a dataframe of sales data. I do a group-by (S,D,Y), then within each group, compute medians and means for weeks 5..43, then merge those back into the parent df. Variable X is sales. X is never NA (i.e. there are no explicit NAs anywhere in df), but if there is no data (as in, no sales) for that S,D,Y and set of weeks, there will simply be no row with those values in df (take it that means zero sales for that particular set of parameters). In other words, impute X=0 in any structurally missing rows (but I hope I don't need to melt/cast the original df,