dplyr | 易学教程

Unique body count column

阅读更多关于 Unique body count column

问题 I'm trying to add a body count for each unique person. Each person has multiple data points. df <- data.frame(PERSON = c("A", "A", "A", "B", "B", "C", "C", "C", "C"), Y = c(2, 5, 4, 1, 2, 5, 3, 7, 1)) This is what I'd like it to look like: PERSON Y UNIQ_CT 1 A 2 1 2 A 5 0 3 A 4 0 4 B 1 1 5 B 2 0 6 C 5 1 7 C 3 0 8 C 7 0 9 C 1 0 回答1: You can use duplicated and negate it: transform(df, uniqct = as.integer(!duplicated(Person))) 回答2: Since there is dplyr tag to the question here is an option

How to create a co-occurrence matrix calculated from combinations by ID/row in R?

阅读更多关于 How to create a co-occurrence matrix calculated from combinations by ID/row in R?

问题 Update Thanks to @jazzurro for his anwer. It made me realize that the duplicates may just complicate things. I hope by keeping only unique values/row simplifies the task.* df <- data.frame(ID = c(1,2,3,4,5), CTR1 = c("England", "England", "England", "China", "Sweden"), CTR2 = c("England", "China", "China", "England", NA), CTR3 = c("USA", "USA", "USA", "USA", NA), CTR4 = c(NA, NA, NA, NA, NA), CTR5 = c(NA, NA, NA, NA, NA), CTR6 = c(NA, NA, NA, NA, NA)) ID CTR1 CTR2 CTR3 CTR4 CTR5 CTR6 1

Select rows from dataframe with unique combination of values from multiple columns

阅读更多关于 Select rows from dataframe with unique combination of values from multiple columns

问题 I have a data.frame in R that is a catalog of results from baseball games for every team for a number of seasons. Some of the columns are team , opponent_team , date , result , team_runs , opponent_runs , etc. My problem is that the because the data.frame is a combination of logs for every team, each row essentially has another row somewhere else in the data.frame that is a mirror image of that row. For example team opponent_team date result team_runs opponent_runs BAL BOS 2010-04-05 W 5 4

How to divide between groups of rows using dplyr

阅读更多关于 How to divide between groups of rows using dplyr

问题 I have the similar data and I want the exact result as what this link states: How to divide between groups of rows using dplyr? However, the only difference with my data is that sometimes column "condition" does not have "A" or "B" all the time, so there's no denominator or numerator sometimes. x <- data.frame( name = rep(letters[1:4], each = 2), condition = rep(c("A", "B"), times = 4), value = c(2,10,4,20,8,40,20,100) ) x = x[-c(4,5),] #this is my dataframe I want to remove rows that do not

Make prediction for each group differently

阅读更多关于 Make prediction for each group differently

问题 I have dataset that looks like this: Category Weekly_Date a b <chr> <date> <dbl> <dbl> 1 aa 2018-07-01 36.6 1.4 2 aa 2018-07-02 5.30 0 3 bb 2018-07-01 4.62 1.2 4 bb 2018-07-02 3.71 1.5 5 cc 2018-07-01 3.41 12 ... ... ... ... ... I fitted linear regression for each group separately: fit_linreg <- train %>% group_by(Category) %>% do(model = lm(Target ~ Unit_price + Unit_discount, data = .)) Now I have different models for each category: aa model1 bb model2 cc model3 So, I need to apply each

dplyr pipeline in a function

阅读更多关于 dplyr pipeline in a function

问题 I'm trying to put a dplyr pipeline in a function but after reading the vignette multiple times as well as the tidy evaluation (https://tidyeval.tidyverse.org/dplyr.html). I still can't get it to work... #Sample data: dat <- read.table(text = "A ID B 1 X 83 2 X NA 3 X NA 4 Y NA 5 X 2 6 Y 2 12 Y 10 7 Y 18 8 Y 85", header = TRUE) # What I'm trying to do: x <- dat %>% filter(!is.na(B)) %>% count('ID') %>% filter(freq>3) x$ID # Now in a function: n_occurences <- function(df, n, column){ # Group by

How to use apply function in a pipe operator

阅读更多关于 How to use apply function in a pipe operator

问题 I have a dataframe which has a few character columns followed by a few numerical columns. I want to add a new column using the %>% operators which is the highest value from the numerical columns per row. let's say the data frame looks like this: character1, character2, value1, value2, value3 "string", "string", 5, 7, 4 "string", "string", 3, 4, 2 "string", "string", 2, 8, 6 Then the new column should be 7 for the first row, 4 for second row and 8 for last row. I am trying to use the apply

R spread dataframe [duplicate]

阅读更多关于 R spread dataframe [duplicate]

问题 This question already has answers here : Reshape multiple value columns to wide format (5 answers) Closed 7 months ago . IN R language how to convert data1 into data2 data1 = fread(" id year cost pf loss A 2019-02 155 10 41 B 2019-03 165 14 22 B 2019-01 185 34 56 C 2019-02 350 50 0 A 2019-01 310 40 99") data2 = fread(" id item 2019-01 2019-02 2019-03 A cost 30 155 NA A pf 40 10 NA A loss 99 41 NA B cost 185 NA 160 B pf 34 NA 14 B loss 56 NA 22 C cost NA 350 NA C pf NA 50 NA C loss NA 0 NA") I

Passing column name into function

阅读更多关于 Passing column name into function

问题 I have a simple problem with non-standard evaluation: passing a variable name as an argument into a function. As a reproducible example, here's a simple thing: taking the mean of one variable, mpg from the mtcars dataset. My end goal is to have a function where I can input the dataset and the variable, and get the mean. So without a function: library(tidyverse) mtcars %>% summarise(mean = mean(mpg)) #> mean #> 1 20.09062 I've tried to use get() for non-standard evaluation, but I'm getting

Passing column name into function

阅读更多关于 Passing column name into function