group-by

add counter column by arranging two variables (dplyr)

点点圈 提交于 2021-01-27 20:53:10
问题 I've been looking for a while here and there but I couldn't find any solution for my situation. I have a data frame with IDs and VAR mixed within it. Here below I tried to reproduced a sample require(dplyr) seed(123) N <- 3 T <- 4 id <- rep(letters[1:N], each = T) var <- rep(sample(seq(1:100),T),N) row <- sample(seq(1:(N*T)),replace = F) dt <- data.frame(ID=id,VAR=var,ROW=row) %>% arrange(ROW) %>% select(-ROW) and I'd like to arrange by ID and VAR and add a counter per group in order to get

Mysql merge count multiple column

て烟熏妆下的殇ゞ 提交于 2021-01-27 19:55:47
问题 I have a table(tb_data) which like this +---------+---------------------+---------------------+---------------------+---------------------+ | Disease | Additional_Disease1 | Additional_Disease2 | Additional_Disease3 | Additional_Disease4 | +---------+---------------------+---------------------+---------------------+---------------------+ | A01 | A03 | A03 | | | | A03 | A02 | | | | | A03 | A05 | | | | | A03 | A05 | | | | | A02 | A05 | A01 | A03 | | +---------+---------------------+------------

removing the first 3 rows of a group with conditional statement in r

喜夏-厌秋 提交于 2021-01-27 19:10:35
问题 I would like to remove rows that are not fulfilling the condition that I want. For example: Event Value 1 1 1 0 1 0 1 0 2 8 2 7 2 1 2 0 2 0 2 0 3 8 3 0 3 0 3 0 3 0 If per event, in the column of value there is a number higher than 2 (Value > 2) remove the first 3 rows starting from that Value that is not fulfilling the criteria. It should look like this: Event Value 1 1 1 0 1 0 1 0 2 0 2 0 3 0 3 0 I have been able to remove the first row of each Event that accomplish the criteria, but haven't

Assign max value of group to all rows in that group

断了今生、忘了曾经 提交于 2021-01-27 17:47:44
问题 I would like to assign the max value of a group to all rows within that group. How do I do that? I have a dataframe containing the names of the group and the max number of credits that belongs to it. course_credits <- aggregate(bsc_academic$Credits, by = list(bsc_academic$Course_code), max) which gives Course Credits 1 ABC1000 6.5 2 ABC1003 6.5 3 ABC1004 6.5 4 ABC1007 5.0 5 ABC1010 6.5 6 ABC1021 6.5 7 ABC1023 6.5 The main dataframe looks like this: Appraisal.Type Resits Credits Course_code

Edit dataframe entries using groupby object --pandas

时光总嘲笑我的痴心妄想 提交于 2021-01-27 15:06:17
问题 Consider the following dataframe: index count signal 1 1 1 2 1 NAN 3 1 NAN 4 1 -1 5 1 NAN 6 2 NAN 7 2 -1 8 2 NAN 9 3 NAN 10 3 NAN 11 3 NAN 12 4 1 13 4 NAN 14 4 NAN I need to 'ffill' the NANs in 'signal' and values with different 'count' value should not affect each other. such that I should get the following dataframe: index count signal 1 1 1 2 1 1 3 1 1 4 1 -1 5 1 -1 6 2 NAN 7 2 -1 8 2 -1 9 3 NAN 10 3 NAN 11 3 NAN 12 4 1 13 4 1 14 4 1 Right now I iterate through each data frame in group by

Add unique groups to DF for each row including sum from other columns

亡梦爱人 提交于 2021-01-27 12:13:08
问题 I got a DatFrame looking like this: ID field_1 area_1 field_2 area_2 field_3 area_3 field_4 area_4 1 scoccer 500 basketball 200 swimming 100 basketball 50 2 volleyball 100 np.nan np.nan np.nan np.nan np.nan np.nan 3 basketball 1000 football 10 np.nan np.nan np.nan np.nan 4 swimming 280 swimming 200 basketball 320 np.nan np.nan 5 volleyball 110 football 160 volleyball 30 np.nan np.nan The original DataFrame has the same structure but containing columns field_1 up to field_30 as well as area_1

Add unique groups to DF for each row including sum from other columns

谁都会走 提交于 2021-01-27 12:05:08
问题 I got a DatFrame looking like this: ID field_1 area_1 field_2 area_2 field_3 area_3 field_4 area_4 1 scoccer 500 basketball 200 swimming 100 basketball 50 2 volleyball 100 np.nan np.nan np.nan np.nan np.nan np.nan 3 basketball 1000 football 10 np.nan np.nan np.nan np.nan 4 swimming 280 swimming 200 basketball 320 np.nan np.nan 5 volleyball 110 football 160 volleyball 30 np.nan np.nan The original DataFrame has the same structure but containing columns field_1 up to field_30 as well as area_1

Sort by most recent but keep together by another ID column

余生颓废 提交于 2021-01-27 07:22:51
问题 I am trying to get some sorting and keep together (not really grouping) working. In my sample data I would like to keep the DealerIDs together, sorted by IsPrimaryDealer DESC, but show the group (ok maybe it is grouping) of dealers by the ones with the most recent entry. Result set 2 is the closest, but Grant and his brother should be displayed as the first two rows, in that order. (Grant should be row 1, Grants Brother row 2 because Grants Brother was the most recently added) DECLARE @temp

Sort by most recent but keep together by another ID column

蓝咒 提交于 2021-01-27 07:22:24
问题 I am trying to get some sorting and keep together (not really grouping) working. In my sample data I would like to keep the DealerIDs together, sorted by IsPrimaryDealer DESC, but show the group (ok maybe it is grouping) of dealers by the ones with the most recent entry. Result set 2 is the closest, but Grant and his brother should be displayed as the first two rows, in that order. (Grant should be row 1, Grants Brother row 2 because Grants Brother was the most recently added) DECLARE @temp

How to do a Leave One Out cross validation by group / subset?

僤鯓⒐⒋嵵緔 提交于 2021-01-24 10:54:44
问题 This question is the second part of a previous question (Linear Regression prediction in R using Leave One out Approach). I'm trying to build models for each country and generate linear regression predictions using the leave one out approach. In other words, in the code below when building model1 and model2 the "data" used should not be the entire data set. Instead it should be a subset of the dataset (country). Each country data should be evaluated using a model built with data specific to