Restructure data in r: reshape, dcast, melt…nothing seems to work for this dataframe

二次信任 提交于 2021-02-16 18:11:08

问题


Here is an example of what the first few rows of my imported dataframe looks like (in the full dataset there are a total of five levels/factors for the subject variable the other two are Algebra II and Geometry).

SID   firstName lastName    subject       sumScaleScore sumPerformanceLevel
604881  JIM     Ro          Mathematics   912           2
604881  JIM     Ro          ELA           964           4
594181  JERRY   Chi         ELA           997           1
594181  JERRY   Chi         Mathematics   918           3
564711  KILE    Gamma       ELA           933           5
564711  KILE    Gamma       Algebra I     1043          7

I want to restructure it from the above long format (where each person has two rows) to a wide format (where each person has one row). For example the first row of new data would contain:

SID  firstName  lastName  sumScaleScore_Mathematics  sumPerformanceLevel_Mathematics  sumScaleScore_ELA  sumPerformanceLevel_ELA
604881 JIM      Ro        912                        2                                964                4

I've tried reshape2's melt, dcast, and some other packages along with reading some help files, but my coding just ain't cutting it. SPSS does this quite easily using "casestovars," but I'm new to r and having no luck. Any ideas?


回答1:


melt using the first four columns and then use dcast:

library(reshape2)
m <- melt(DF, id = 1:4)
dcast(m, SID + firstName + lastName ~...)

giving:

     SID firstName lastName AlgebraI_sumScaleScore AlgebraI_sumPerformanceLevel
1 564711      KILE    Gamma                   1043                            7
2 594181     JERRY      Chi                     NA                           NA
3 604881       JIM       Ro                     NA                           NA
  ELA_sumScaleScore ELA_sumPerformanceLevel Mathematics_sumScaleScore
1               933                       5                        NA
2               997                       1                       918
3               964                       4                       912
  Mathematics_sumPerformanceLevel
1                              NA
2                               3
3                               2

Note: We used this input:

Lines <- "SID   firstName lastName    subject       sumScaleScore sumPerformanceLevel
604881  JIM     Ro          Mathematics   912           2
604881  JIM     Ro          ELA           964           4
594181  JERRY   Chi         ELA           997           1
594181  JERRY   Chi         Mathematics   918           3
564711  KILE    Gamma       ELA           933           5
564711  KILE    Gamma       AlgebraI     1043          7"
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE)



回答2:


The dcast function has been reworked in the "data.table" package and now accepts multiple value.vars.

One big change is that you can directly cast multiple columns to a wide form without first having to melt the data, making the process far more efficient than the present reshape2 approach.

library(data.table)
dcast(as.data.table(DF), ... ~ subject, value.var = c("sumScaleScore", "sumPerformanceLevel"))
##       SID firstName lastName sumScaleScore_AlgebraI sumScaleScore_ELA
## 1: 564711      KILE    Gamma                   1043               933
## 2: 594181     JERRY      Chi                     NA               997
## 3: 604881       JIM       Ro                     NA               964
##    sumScaleScore_Mathematics sumPerformanceLevel_AlgebraI sumPerformanceLevel_ELA
## 1:                        NA                            7                       5
## 2:                       918                           NA                       1
## 3:                       912                           NA                       4
##    sumPerformanceLevel_Mathematics
## 1:                              NA
## 2:                               3
## 3:                               2



回答3:


Here's an alternative using reshape() in base R:

reshape(df,direction="wide",idvar=c("SID","firstName","lastName"),timevar="subject")

#      SID firstName lastName sumScaleScore.Mathematics sumPerformanceLevel.Mathematics sumScaleScore.ELA
# 1 604881       JIM       Ro                       912                               2               964
# 3 594181     JERRY      Chi                       918                               3               997
# 5 564711      KILE    Gamma                        NA                              NA               933
#   sumPerformanceLevel.ELA sumScaleScore.Algebra I sumPerformanceLevel.Algebra I
# 1                       4                      NA                            NA
# 3                       1                      NA                            NA
# 5                       5                    1043                             7# 

And if you're using reshape2, then you can combine melt and dcast into one function with recast:

recast(df,SID+firstName+lastName~...,id.var=1:4)

#      SID firstName lastName Algebra I_sumScaleScore Algebra I_sumPerformanceLevel ELA_sumScaleScore
# 1 564711      KILE    Gamma                    1043                             7               933
# 2 594181     JERRY      Chi                      NA                            NA               997
# 3 604881       JIM       Ro                      NA                            NA               964
#   ELA_sumPerformanceLevel Mathematics_sumScaleScore Mathematics_sumPerformanceLevel
# 1                       5                        NA                              NA
# 2                       1                       918                               3
# 3                       4                       912                               2


来源:https://stackoverflow.com/questions/34444703/restructure-data-in-r-reshape-dcast-melt-nothing-seems-to-work-for-this-dat

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!