group-by | 易学教程

dplyr: Subtracting values group-wise by group that matches given condition

阅读更多关于 dplyr: Subtracting values group-wise by group that matches given condition

问题 right now I'm refactoring an 'base'-based R script by using 'dplyr' instead. Basically, I want to group_by Gene and subtract the values group-wise by a group that matches a given condition. In this case, I want values of Gene == 'C' and subtract them from all others. Simplified data: x <- data.frame('gene' = c('A','A','A','B','B','B','C','C','C'), 'sample' = rep_len(c('wt','mut1','mut2'),3), 'value' = c(32.3,31,30.5,25,25.3,22.1,20.5,21.2,19.8)) gene sample value 1 A wt 32.3 2 A mut1 31.0 3 A

clickhouse downsample into OHLC time bar intervals

阅读更多关于 clickhouse downsample into OHLC time bar intervals

问题 For a table e.g. containing a date, price timeseries with prices every e.g. millisecond, how can this be downsampled into groups of open high low close (ohlc) rows with time interval e.g. minute? 回答1: While option with arrays will work, the simplest option here is to use use combination of group by timeintervals with min , max , argMin , argMax aggregate functions. SELECT id, minute, max(value) AS high, min(value) AS low, avg(value) AS avg, argMin(value, timestamp) AS first, argMax(value,

clickhouse downsample into OHLC time bar intervals

阅读更多关于 clickhouse downsample into OHLC time bar intervals

Group Array with count

阅读更多关于 Group Array with count

问题 I have an array of items that contains several properties. One of the properties is an array of tags. What is the best way of getting all the tags used in those items and ordered by the number of times that those tags are being used on those items? I've been trying to look to underscore js but not getting the expected results. return _.groupBy(items, 'tags'); Example of my data: item1 - itemName: item1 - tags(array): tag1, tag2 item2 - itemName: item2 - tags(array): tag1, tag3 so I'm trying

Condition on count of associated records in SQL

阅读更多关于 Condition on count of associated records in SQL

问题 I have the following tables (with given columns): houses (id) users (id, house_id, active) custom_values (name, house_id, type) I want to get all the (distinct) houses and the count of associated users that: have at least 1 associated custom_value which name column contains the string 'red' (case insensitive) AND the custom_value column type value is 'mandatory'. have at least 100 associated users which status column is 'active' How can I run this query in PostgreSQL? Right now I have this

Get the row corresponding to the max in pandas GroupBy

阅读更多关于 Get the row corresponding to the max in pandas GroupBy

问题 Simple DataFrame: df = pd.DataFrame({'A': [1,1,2,2], 'B': [0,1,2,3], 'C': ['a','b','c','d']}) df A B C 0 1 0 a 1 1 1 b 2 2 2 c 3 2 3 d I wish for every value ( groupby ) of column A, to get the value of column C, for which column B is maximum. For example for group 1 of column A, the maximum of column B is 1, so I want the value "b" of column C: A C 0 1 b 1 2 d No need to assume column B is sorted, performance is of top priority, then elegance. 回答1: Check with sort_values + drop_duplicates df

Get the row corresponding to the max in pandas GroupBy

阅读更多关于 Get the row corresponding to the max in pandas GroupBy

in R dplyr why do I need to ungroup() after I count()?

阅读更多关于 in R dplyr why do I need to ungroup() after I count()?

问题 When I first started programming in R I would often use dplyr count() . library(tidyverse) mtcars %>% count(cyl) Once I started using apply functions I started running into issues with count() . If I simply added ungroup() to the end of my count() 's the problems would go away. I don't have any particular reproducibles to show. But can somebody explain what the issue likely was, why ungroup() always fixed it, and are there any drawbacks to consistently using ungroup() after every count() , or

in R dplyr why do I need to ungroup() after I count()?

阅读更多关于 in R dplyr why do I need to ungroup() after I count()?

Pandas Groupby: Count and mean combined

阅读更多关于 Pandas Groupby: Count and mean combined

问题 Working with PANDAS to try and summarise a dataframe as a count of certain categories, as well as the means sentiment score for these categories. There is table full of strings which have different sentiment scores, and I want to group each text source by saying how many posts they have, as well as the average sentiment of these posts. My (simplified) dataframe looks like this: source text sent -------------------------------- bar some string 0.13 foo alt string -0.8 bar another str 0.7 foo