group-by

dplyr: Subtracting values group-wise by group that matches given condition

China☆狼群 提交于 2020-07-07 07:08:25
问题 right now I'm refactoring an 'base'-based R script by using 'dplyr' instead. Basically, I want to group_by Gene and subtract the values group-wise by a group that matches a given condition. In this case, I want values of Gene == 'C' and subtract them from all others. Simplified data: x <- data.frame('gene' = c('A','A','A','B','B','B','C','C','C'), 'sample' = rep_len(c('wt','mut1','mut2'),3), 'value' = c(32.3,31,30.5,25,25.3,22.1,20.5,21.2,19.8)) gene sample value 1 A wt 32.3 2 A mut1 31.0 3 A

clickhouse downsample into OHLC time bar intervals

三世轮回 提交于 2020-07-06 20:12:01
问题 For a table e.g. containing a date, price timeseries with prices every e.g. millisecond, how can this be downsampled into groups of open high low close (ohlc) rows with time interval e.g. minute? 回答1: While option with arrays will work, the simplest option here is to use use combination of group by timeintervals with min , max , argMin , argMax aggregate functions. SELECT id, minute, max(value) AS high, min(value) AS low, avg(value) AS avg, argMin(value, timestamp) AS first, argMax(value,

clickhouse downsample into OHLC time bar intervals

半世苍凉 提交于 2020-07-06 20:09:02
问题 For a table e.g. containing a date, price timeseries with prices every e.g. millisecond, how can this be downsampled into groups of open high low close (ohlc) rows with time interval e.g. minute? 回答1: While option with arrays will work, the simplest option here is to use use combination of group by timeintervals with min , max , argMin , argMax aggregate functions. SELECT id, minute, max(value) AS high, min(value) AS low, avg(value) AS avg, argMin(value, timestamp) AS first, argMax(value,

Group Array with count

浪子不回头ぞ 提交于 2020-07-03 04:42:47
问题 I have an array of items that contains several properties. One of the properties is an array of tags. What is the best way of getting all the tags used in those items and ordered by the number of times that those tags are being used on those items? I've been trying to look to underscore js but not getting the expected results. return _.groupBy(items, 'tags'); Example of my data: item1 - itemName: item1 - tags(array): tag1, tag2 item2 - itemName: item2 - tags(array): tag1, tag3 so I'm trying

Condition on count of associated records in SQL

独自空忆成欢 提交于 2020-06-28 04:01:18
问题 I have the following tables (with given columns): houses (id) users (id, house_id, active) custom_values (name, house_id, type) I want to get all the (distinct) houses and the count of associated users that: have at least 1 associated custom_value which name column contains the string 'red' (case insensitive) AND the custom_value column type value is 'mandatory'. have at least 100 associated users which status column is 'active' How can I run this query in PostgreSQL? Right now I have this

Get the row corresponding to the max in pandas GroupBy

喜你入骨 提交于 2020-06-25 21:45:48
问题 Simple DataFrame: df = pd.DataFrame({'A': [1,1,2,2], 'B': [0,1,2,3], 'C': ['a','b','c','d']}) df A B C 0 1 0 a 1 1 1 b 2 2 2 c 3 2 3 d I wish for every value ( groupby ) of column A, to get the value of column C, for which column B is maximum. For example for group 1 of column A, the maximum of column B is 1, so I want the value "b" of column C: A C 0 1 b 1 2 d No need to assume column B is sorted, performance is of top priority, then elegance. 回答1: Check with sort_values + drop_duplicates df

Get the row corresponding to the max in pandas GroupBy

混江龙づ霸主 提交于 2020-06-25 21:45:28
问题 Simple DataFrame: df = pd.DataFrame({'A': [1,1,2,2], 'B': [0,1,2,3], 'C': ['a','b','c','d']}) df A B C 0 1 0 a 1 1 1 b 2 2 2 c 3 2 3 d I wish for every value ( groupby ) of column A, to get the value of column C, for which column B is maximum. For example for group 1 of column A, the maximum of column B is 1, so I want the value "b" of column C: A C 0 1 b 1 2 d No need to assume column B is sorted, performance is of top priority, then elegance. 回答1: Check with sort_values + drop_duplicates df

in R dplyr why do I need to ungroup() after I count()?

本秂侑毒 提交于 2020-06-25 09:09:28
问题 When I first started programming in R I would often use dplyr count() . library(tidyverse) mtcars %>% count(cyl) Once I started using apply functions I started running into issues with count() . If I simply added ungroup() to the end of my count() 's the problems would go away. I don't have any particular reproducibles to show. But can somebody explain what the issue likely was, why ungroup() always fixed it, and are there any drawbacks to consistently using ungroup() after every count() , or

in R dplyr why do I need to ungroup() after I count()?

只谈情不闲聊 提交于 2020-06-25 09:07:20
问题 When I first started programming in R I would often use dplyr count() . library(tidyverse) mtcars %>% count(cyl) Once I started using apply functions I started running into issues with count() . If I simply added ungroup() to the end of my count() 's the problems would go away. I don't have any particular reproducibles to show. But can somebody explain what the issue likely was, why ungroup() always fixed it, and are there any drawbacks to consistently using ungroup() after every count() , or

Pandas Groupby: Count and mean combined

吃可爱长大的小学妹 提交于 2020-06-24 07:15:20
问题 Working with PANDAS to try and summarise a dataframe as a count of certain categories, as well as the means sentiment score for these categories. There is table full of strings which have different sentiment scores, and I want to group each text source by saying how many posts they have, as well as the average sentiment of these posts. My (simplified) dataframe looks like this: source text sent -------------------------------- bar some string 0.13 foo alt string -0.8 bar another str 0.7 foo