dplyr

How to combine two data frames using dplyr or other packages?

耗尽温柔 提交于 2021-02-09 15:40:13
问题 I have two data frames: df1 = data.frame(index=c(0,3,4),n1=c(1,2,3)) df1 # index n1 # 1 0 1 # 2 3 2 # 3 4 3 df2 = data.frame(index=c(1,2,3),n2=c(4,5,6)) df2 # index n2 # 1 1 4 # 2 2 5 # 3 3 6 I want to join these to: index n 1 0 1 2 1 4 3 2 5 4 3 8 (index 3 in two df, so add 2 and 6 in each df) 5 4 3 6 5 0 (index 5 not exists in either df, so set 0) 7 6 0 (index 6 not exists in either df, so set 0) The given data frames are just part of large dataset. Can I do it using dplyr or other packages

How to combine two data frames using dplyr or other packages?

こ雲淡風輕ζ 提交于 2021-02-09 15:38:04
问题 I have two data frames: df1 = data.frame(index=c(0,3,4),n1=c(1,2,3)) df1 # index n1 # 1 0 1 # 2 3 2 # 3 4 3 df2 = data.frame(index=c(1,2,3),n2=c(4,5,6)) df2 # index n2 # 1 1 4 # 2 2 5 # 3 3 6 I want to join these to: index n 1 0 1 2 1 4 3 2 5 4 3 8 (index 3 in two df, so add 2 and 6 in each df) 5 4 3 6 5 0 (index 5 not exists in either df, so set 0) 7 6 0 (index 6 not exists in either df, so set 0) The given data frames are just part of large dataset. Can I do it using dplyr or other packages

Access the column names in the `mutate_at` to use it for subseting a list

非 Y 不嫁゛ 提交于 2021-02-09 13:58:33
问题 I am trying to recode several variables but with different recode schemes. The recoding scheme is saved in a list where each element is a named vector of the form old = new . Each element is the recoding scheme for each variable in the data frame I am using the mutate_at function and the recode . I think that the problem is that I cannot extract the variable name to use it to get the correct recoding scheme from the list I tried deparse(substitute(.)) as in here and also this didn;t help Also

Access the column names in the `mutate_at` to use it for subseting a list

拟墨画扇 提交于 2021-02-09 13:57:29
问题 I am trying to recode several variables but with different recode schemes. The recoding scheme is saved in a list where each element is a named vector of the form old = new . Each element is the recoding scheme for each variable in the data frame I am using the mutate_at function and the recode . I think that the problem is that I cannot extract the variable name to use it to get the correct recoding scheme from the list I tried deparse(substitute(.)) as in here and also this didn;t help Also

dplyr: Need help returning column index of first non-NA value in every row

浪尽此生 提交于 2021-02-08 19:55:32
问题 I've recently started trying to do all of my code in the tidyverse. This has sometimes lead me to difficulties. Here is a simple task that I haven't been able to complete in the tidyverse: I need a column in a dataframe that returns the position index of the first non-na value from the left. Does anyone know how to achieve this in dplyr using mutate? Here is the desired output. data.frame( "X1"=c(100,rep(NA,8)), "X2"=c(NA,10,rep(NA,7)), "X3"=c(NA,NA,1000,1000,rep(NA,5)), "X4"=c(rep(NA,4),25

adding hash to each row using dplyr and digest in R

大兔子大兔子 提交于 2021-02-08 19:45:09
问题 I need to add a fingerprint to each row in a dataset so to check with a later version of the same set to look for difference. I know how to add hash for each row in R like below: data.frame(iris,hash=apply(iris,1,digest)) I am learning to use dplyr as the dataset is getting huge and I need to store them in SQL Server, I tried something like below but the hash is not working, all rows give the same hash: iris %>% rowwise() %>% mutate(hash=digest(.)) Any clue for row-wise hashing using dplyr?

Unable to subset (filter) a data frame due to NA's

China☆狼群 提交于 2021-02-08 19:12:12
问题 Why in the code below dplyr's filter doesn't return the same data.frame as base R subsetting? In fact none of them works as expected. I'd like to remove observations/rows which, simultaneously, b==1 AND c==1 . That is, I'd like to remove only the third row. require(dplyr) df <- data.frame(a=c(0,0,0,0,1,1,1), b=c(0,0,1,1,0,0,1), c=c(1,NA,1,NA,1,NA,NA)) filter(df, !(b==1 & c==1)) df[!(df$b==1 & df$c==1),] 回答1: Or use complete.cases to convert NA to FALSE in the result logic vector so that you

Unable to subset (filter) a data frame due to NA's

僤鯓⒐⒋嵵緔 提交于 2021-02-08 19:10:57
问题 Why in the code below dplyr's filter doesn't return the same data.frame as base R subsetting? In fact none of them works as expected. I'd like to remove observations/rows which, simultaneously, b==1 AND c==1 . That is, I'd like to remove only the third row. require(dplyr) df <- data.frame(a=c(0,0,0,0,1,1,1), b=c(0,0,1,1,0,0,1), c=c(1,NA,1,NA,1,NA,NA)) filter(df, !(b==1 & c==1)) df[!(df$b==1 & df$c==1),] 回答1: Or use complete.cases to convert NA to FALSE in the result logic vector so that you

Unable to subset (filter) a data frame due to NA's

淺唱寂寞╮ 提交于 2021-02-08 19:06:34
问题 Why in the code below dplyr's filter doesn't return the same data.frame as base R subsetting? In fact none of them works as expected. I'd like to remove observations/rows which, simultaneously, b==1 AND c==1 . That is, I'd like to remove only the third row. require(dplyr) df <- data.frame(a=c(0,0,0,0,1,1,1), b=c(0,0,1,1,0,0,1), c=c(1,NA,1,NA,1,NA,NA)) filter(df, !(b==1 & c==1)) df[!(df$b==1 & df$c==1),] 回答1: Or use complete.cases to convert NA to FALSE in the result logic vector so that you

Divide or split dataframe into multiple dfs based on empty row and header title

ⅰ亾dé卋堺 提交于 2021-02-08 12:09:36
问题 I have a dataframe which has multiple values in a single file. I want to divide it into multiple files around 25 from the file. Pattern for the file is where there is one blank row and a header title is there , it is a new df. I Have tried this Splitting dataframes in R based on empty rows but this does not take care of any blank row within the new df (V1 column 9th row). I want the data to be divided on empty row and a header title my data and code i have tried is given below . Also how can