dplyr

Efficient way of labelling based on start and end position

元气小坏坏 提交于 2021-02-05 08:26:33
问题 I have 2 dataframes das <- data.frame(val=1:20, type =c("A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","C","C","C","C"), weigh=c(20,22,23,32,34,54,19,22,24,26,31,34,36,37,51,54,31,35,43,45)) mapper <- data.frame(type=c("A","A","A","A","B","B","B","B","C","C","C","C"),start = c(19,23,27,37 ,17,25,39,50, 17,23,33,39),end = c(23,27,37,55 ,25,39,50,60, 23,33,39,48)) The expected output is val type weigh labelweight 1 1 A 20 A_19 2 2 A 22 A_19 3 3 A 23 A_23 4 4 A 32 A_27 5 5 A 34

Efficient way of labelling based on start and end position

三世轮回 提交于 2021-02-05 08:26:09
问题 I have 2 dataframes das <- data.frame(val=1:20, type =c("A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","C","C","C","C"), weigh=c(20,22,23,32,34,54,19,22,24,26,31,34,36,37,51,54,31,35,43,45)) mapper <- data.frame(type=c("A","A","A","A","B","B","B","B","C","C","C","C"),start = c(19,23,27,37 ,17,25,39,50, 17,23,33,39),end = c(23,27,37,55 ,25,39,50,60, 23,33,39,48)) The expected output is val type weigh labelweight 1 1 A 20 A_19 2 2 A 22 A_19 3 3 A 23 A_23 4 4 A 32 A_27 5 5 A 34

Efficient way of labelling based on start and end position

眉间皱痕 提交于 2021-02-05 08:26:06
问题 I have 2 dataframes das <- data.frame(val=1:20, type =c("A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","C","C","C","C"), weigh=c(20,22,23,32,34,54,19,22,24,26,31,34,36,37,51,54,31,35,43,45)) mapper <- data.frame(type=c("A","A","A","A","B","B","B","B","C","C","C","C"),start = c(19,23,27,37 ,17,25,39,50, 17,23,33,39),end = c(23,27,37,55 ,25,39,50,60, 23,33,39,48)) The expected output is val type weigh labelweight 1 1 A 20 A_19 2 2 A 22 A_19 3 3 A 23 A_23 4 4 A 32 A_27 5 5 A 34

R unable to load dplyr

时光总嘲笑我的痴心妄想 提交于 2021-02-05 08:11:50
问题 I'm running from Ubuntu 16, using R version 3.4.1. I have dplyr installed and can load it when either I am running from RStudio or when I sudo into R from the terminal. However, if I run R without root permission, I cannot load dplyr due to the following error: Error: package or namespace load failed for ‘dplyr’ in dyn.load(file, DLLpath = DLLpath, ...): unable to load shared object '<user-directory>/R/x86_64-pc-linux-gnu-library/3.4/Rcpp/libs/Rcpp.so': <user-directory>/anaconda3/lib/R/bin

dplyr::left_join produce NA values for new joined columns

百般思念 提交于 2021-02-05 08:10:13
问题 I have two tables I wish to left_join through the dplyr package. The issue is that is produces NA values for all new columns (the ones I'm after). As you can see below, the left_join procudes NA values for the new column of Incep.Price and DayCounter . Why does this happen, and how can this be resolved? Update: Thanks to @akrun, using left_join(Avanza.XML, checkpoint, by = c('Firm' = 'Firm')) solves the issue and the columns are joined correctly. However the warning message is sitll the same,

dplyr::left_join produce NA values for new joined columns

那年仲夏 提交于 2021-02-05 08:07:05
问题 I have two tables I wish to left_join through the dplyr package. The issue is that is produces NA values for all new columns (the ones I'm after). As you can see below, the left_join procudes NA values for the new column of Incep.Price and DayCounter . Why does this happen, and how can this be resolved? Update: Thanks to @akrun, using left_join(Avanza.XML, checkpoint, by = c('Firm' = 'Firm')) solves the issue and the columns are joined correctly. However the warning message is sitll the same,

How to compute running sum conditional on TRUE & FALSE

本秂侑毒 提交于 2021-02-05 07:59:08
问题 I am attempting to create a new column that is a conditional difference based on a column of TRUE and FALSE. If the lag 1 row is FALSE then we should compute a difference from either the beginning or the last TRUE row, whichever is later in the dataframe, however if the lag 1 row is TRUE then the difference should be should be reset. I would like to use the dplyr::mutate function as much as possible. I'm attempting to use dplyr::lag with an ifelse() but I'm having a hard time with the

ranking with dplyr between groups

百般思念 提交于 2021-02-05 07:58:25
问题 I have this dataframe library(dplyr) d =data.frame(group1 = c("A","B","A","B"), group2 = c("e","f","e","f"), value=c(1,2,3,4) ) d%>% group_by(group2) %>% mutate(total_value = sum(value)) %>% arrange(-total_value) %>% mutate( rank = rank(-total_value, ties.method = "max") ) group1 group2 value total_value rank <fct> <fct> <dbl> <dbl> <int> 1 B f 2 6 2 2 B f 4 6 2 3 A e 1 4 2 4 A e 3 4 2 and I'd like to have the rank column show 1 for both fs and 2 for boths es. Basically after the arrange(

How to pass a filter statement as a function parameter in dplyr using quosure [duplicate]

白昼怎懂夜的黑 提交于 2021-02-05 07:55:09
问题 This question already has answers here : dplyr/rlang: parse_expr with multiple expressions (3 answers) Closed 9 months ago . Using the dplyr package in R , I want to pass a filter statement as a parameter in a function. I don't know how to evaluate the statement as code instead of a string. When I try the code below, I get an error message. I'm assuming I need a quosure or something, but I don't fully grasp that concept. data("PlantGrowth") myfunc <- function(df, filter_statement) { df %>%

How do I pass a dynamic variable name created using enquo() to dplyr's mutate for evaluation?

人走茶凉 提交于 2021-02-05 07:26:05
问题 I'm creating a workflow that contains the same piping steps of renaming, selecting by, then mutating all using a name I provide prior to the pipe. I have had success using enquo() and !! (bang bang) to rename to my desired string and then select it again, but when I reach the mutate step it either repeats the text string as the column values or will not evaluate. I've recreated the code below: #Testing rename, select, and mutate use cases for enquo() #Load packages library(dplyr) library