Long to wide format with several duplicates. Circumvent with unique combo of columns

徘徊边缘 提交于 2021-02-10 19:53:24

问题


I have a dataset similar to this (real one is way bigger). It is in long format and I need to change it to wide format with one row per id. My problem is that there are a lot of different combinations of time, drug, unit and admin. Only a combination of time, drug, unit and admin will be unique and should only occur once pr id. I could not find a solution to this. I would like R to create unique combinations of columns so the data can be transformed to wide format. I have tried

melt.data.table(df, id.vars=c(id,time,drug,unit,admin), measure.vars = c(dose), na.rm=F)

and also a combination with

%>% expand(nesting(time, drug, unit, admin, dose), id)

but it doesn't work. Here is mock data:

id<-c(1492,1492,1492,1492,1493,1493)
time<-c("Pre-bypass","Post-bypass","Total","Post-bypass","Pre-OP","Pre-OP")
drug<-c("ACE","LEVO","LEVO","MIL","BB","BC")
unit<-c(NA,"ml/hr","ml","mg",NA,NA)
admin<-c(NA, "IV","IV","Inhale",NA,NA)
dose<-c(NA,50,40,5,NA,NA)
df<-rbind(id,time,drug,unit,admin,dose)
df<-t(df)
df<-as.data.table(df)

I would like my output to be something like this (the reason for the TRUE in Pre.bypass.Ace.unitNA.adminNA and Pre.OP columns is that dose and unit is missing here but because it is listed it is given in standard dose and unit:

id.new<-c(1492,1493)
Pre.OP.BB.unitNA.adminNA<-c(NA,TRUE)
Pre.OP.BC.unitNA.adminNA<-c(NA,TRUE)
Total.LEVO.ml.h.IV<-c(40,NA)
Pre.bypass.Ace.unitNA.adminNA<-c(TRUE,NA)
Post.bypass.LEVO.ml.h.IV<-c(50,NA)
Post.bypass.MIL.ml.h.IV<-c(5,NA)
df.new<-rbind(id.new,Post.bypass.MIL.ml.h.IV,Pre.OP.BB.unitNA.adminNA,Pre.OP.BC.unitNA.adminNA,Total.LEVO.ml.h.IV,Pre.bypass.Ace.unitNA.adminNA,Post.bypass.LEVO.ml.h.IV)
df.new<-t(df.new)

回答1:


I agree with the comments that long format is usually the better way to go. If you have to use wide format the using the tidyr package you can do the following:

library(tidyr)
df %>% 
  unite(combination, time, drug, unit, admin) %>% 
  spread(key = combination, value  = dose)



回答2:


library(data.table)
id <- c(1492, 1492, 1492, 1492, 1493, 1493)
time <- c("Pre-bypass", "Post-bypass", "Total", "Post-bypass", "Pre-OP", "Pre-OP")
drug <- c("ACE", "LEVO", "LEVO", "MIL", "BB", "BC")
unit <- c(NA, "ml/hr", "ml", "mg", NA, NA)
admin <- c(NA, "IV", "IV", "Inhale", NA, NA)
dose <- c(NA, 50, 40, 5, NA, NA)
df <- rbind(id, time, drug, unit, admin, dose)
df <- t(df)
df <- as.data.table(df)
df
#>      id        time drug  unit  admin dose
#> 1: 1492  Pre-bypass  ACE    NA     NA   NA
#> 2: 1492 Post-bypass LEVO ml/hr     IV   50
#> 3: 1492       Total LEVO    ml     IV   40
#> 4: 1492 Post-bypass  MIL    mg Inhale    5
#> 5: 1493      Pre-OP   BB    NA     NA   NA
#> 6: 1493      Pre-OP   BC    NA     NA   NA

Convert to wide with data.table package function dcast

data.table::dcast(df, id ~ ..., value.var = "dose")
#>      id Post-bypass_LEVO_ml/hr_IV Post-bypass_MIL_mg_Inhale
#> 1: 1492                        50                         5
#> 2: 1493                        NA                        NA
#>    Pre-OP_BB_NA_NA Pre-OP_BC_NA_NA Pre-bypass_ACE_NA_NA Total_LEVO_ml_IV
#> 1:              NA              NA                   NA               40
#> 2:              NA              NA                   NA               NA


来源:https://stackoverflow.com/questions/49909940/long-to-wide-format-with-several-duplicates-circumvent-with-unique-combo-of-col

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!