dplyr

Unique body count column

≯℡__Kan透↙ 提交于 2021-02-05 07:10:32
问题 I'm trying to add a body count for each unique person. Each person has multiple data points. df <- data.frame(PERSON = c("A", "A", "A", "B", "B", "C", "C", "C", "C"), Y = c(2, 5, 4, 1, 2, 5, 3, 7, 1)) This is what I'd like it to look like: PERSON Y UNIQ_CT 1 A 2 1 2 A 5 0 3 A 4 0 4 B 1 1 5 B 2 0 6 C 5 1 7 C 3 0 8 C 7 0 9 C 1 0 回答1: You can use duplicated and negate it: transform(df, uniqct = as.integer(!duplicated(Person))) 回答2: Since there is dplyr tag to the question here is an option

How to create a co-occurrence matrix calculated from combinations by ID/row in R?

♀尐吖头ヾ 提交于 2021-02-05 07:00:14
问题 Update Thanks to @jazzurro for his anwer. It made me realize that the duplicates may just complicate things. I hope by keeping only unique values/row simplifies the task.* df <- data.frame(ID = c(1,2,3,4,5), CTR1 = c("England", "England", "England", "China", "Sweden"), CTR2 = c("England", "China", "China", "England", NA), CTR3 = c("USA", "USA", "USA", "USA", NA), CTR4 = c(NA, NA, NA, NA, NA), CTR5 = c(NA, NA, NA, NA, NA), CTR6 = c(NA, NA, NA, NA, NA)) ID CTR1 CTR2 CTR3 CTR4 CTR5 CTR6 1

Select rows from dataframe with unique combination of values from multiple columns

元气小坏坏 提交于 2021-02-05 06:59:05
问题 I have a data.frame in R that is a catalog of results from baseball games for every team for a number of seasons. Some of the columns are team , opponent_team , date , result , team_runs , opponent_runs , etc. My problem is that the because the data.frame is a combination of logs for every team, each row essentially has another row somewhere else in the data.frame that is a mirror image of that row. For example team opponent_team date result team_runs opponent_runs BAL BOS 2010-04-05 W 5 4

How to divide between groups of rows using dplyr

不打扰是莪最后的温柔 提交于 2021-02-05 06:42:05
问题 I have the similar data and I want the exact result as what this link states: How to divide between groups of rows using dplyr? However, the only difference with my data is that sometimes column "condition" does not have "A" or "B" all the time, so there's no denominator or numerator sometimes. x <- data.frame( name = rep(letters[1:4], each = 2), condition = rep(c("A", "B"), times = 4), value = c(2,10,4,20,8,40,20,100) ) x = x[-c(4,5),] #this is my dataframe I want to remove rows that do not

Make prediction for each group differently

瘦欲@ 提交于 2021-02-05 06:41:09
问题 I have dataset that looks like this: Category Weekly_Date a b <chr> <date> <dbl> <dbl> 1 aa 2018-07-01 36.6 1.4 2 aa 2018-07-02 5.30 0 3 bb 2018-07-01 4.62 1.2 4 bb 2018-07-02 3.71 1.5 5 cc 2018-07-01 3.41 12 ... ... ... ... ... I fitted linear regression for each group separately: fit_linreg <- train %>% group_by(Category) %>% do(model = lm(Target ~ Unit_price + Unit_discount, data = .)) Now I have different models for each category: aa model1 bb model2 cc model3 So, I need to apply each

dplyr pipeline in a function

这一生的挚爱 提交于 2021-02-05 06:37:45
问题 I'm trying to put a dplyr pipeline in a function but after reading the vignette multiple times as well as the tidy evaluation (https://tidyeval.tidyverse.org/dplyr.html). I still can't get it to work... #Sample data: dat <- read.table(text = "A ID B 1 X 83 2 X NA 3 X NA 4 Y NA 5 X 2 6 Y 2 12 Y 10 7 Y 18 8 Y 85", header = TRUE) # What I'm trying to do: x <- dat %>% filter(!is.na(B)) %>% count('ID') %>% filter(freq>3) x$ID # Now in a function: n_occurences <- function(df, n, column){ # Group by

How to use apply function in a pipe operator

我怕爱的太早我们不能终老 提交于 2021-02-04 21:38:32
问题 I have a dataframe which has a few character columns followed by a few numerical columns. I want to add a new column using the %>% operators which is the highest value from the numerical columns per row. let's say the data frame looks like this: character1, character2, value1, value2, value3 "string", "string", 5, 7, 4 "string", "string", 3, 4, 2 "string", "string", 2, 8, 6 Then the new column should be 7 for the first row, 4 for second row and 8 for last row. I am trying to use the apply

R spread dataframe [duplicate]

烂漫一生 提交于 2021-02-04 21:33:54
问题 This question already has answers here : Reshape multiple value columns to wide format (5 answers) Closed 7 months ago . IN R language how to convert data1 into data2 data1 = fread(" id year cost pf loss A 2019-02 155 10 41 B 2019-03 165 14 22 B 2019-01 185 34 56 C 2019-02 350 50 0 A 2019-01 310 40 99") data2 = fread(" id item 2019-01 2019-02 2019-03 A cost 30 155 NA A pf 40 10 NA A loss 99 41 NA B cost 185 NA 160 B pf 34 NA 14 B loss 56 NA 22 C cost NA 350 NA C pf NA 50 NA C loss NA 0 NA") I

Passing column name into function

 ̄綄美尐妖づ 提交于 2021-02-04 21:06:56
问题 I have a simple problem with non-standard evaluation: passing a variable name as an argument into a function. As a reproducible example, here's a simple thing: taking the mean of one variable, mpg from the mtcars dataset. My end goal is to have a function where I can input the dataset and the variable, and get the mean. So without a function: library(tidyverse) mtcars %>% summarise(mean = mean(mpg)) #> mean #> 1 20.09062 I've tried to use get() for non-standard evaluation, but I'm getting

Passing column name into function

两盒软妹~` 提交于 2021-02-04 21:05:59
问题 I have a simple problem with non-standard evaluation: passing a variable name as an argument into a function. As a reproducible example, here's a simple thing: taking the mean of one variable, mpg from the mtcars dataset. My end goal is to have a function where I can input the dataset and the variable, and get the mean. So without a function: library(tidyverse) mtcars %>% summarise(mean = mean(mpg)) #> mean #> 1 20.09062 I've tried to use get() for non-standard evaluation, but I'm getting