Tidy up and reshape messy dataset (reshape/gather/unite function)?

最后都变了- 提交于 2019-12-02 05:42:33

Using the melt function from the data.table-package you can use multiple measures by patterns and as a result create more than one value column:

library(data.table)
melt(setDT(df), measure.vars = patterns('_age','gesl','gebdat'),
     value.name = c('age','geslacht','geboortedatum')
     )[, variable := c('patient','partner')[variable]][]

you get:

    id groep_MNC zkhs fbeh relpnst pbb1 pbb2 variable      age geslacht geboortedatum
 1:  3         1    1    1       1    5    5  patient 42.50000        1    1955-12-01
 2:  5         1    1    1       2    2    1  patient 55.16667        1    1943-04-09
 3:  7         1    1    1       1    5    3  patient 40.25000        1    1958-04-10
 4: 10         1    1    1       1    5    3  patient 40.25000        1    1958-04-17
 5: 12         1    1    2       1    5    5  patient 50.66667        1    1947-11-01
 6:  3         1    1    1       1    5    5  partner       NA        2          <NA>
 7:  5         1    1    1       2    2    1  partner 36.50000        1    1962-04-18
 8:  7         1    1    1       1    5    3  partner       NA        2          <NA>
 9: 10         1    1    1       1    5    3  partner 41.33333        2    1957-07-31
10: 12         1    1    2       1    5    5  partner 54.58333        2    1944-06-08

Instead of patterns you could also use a list of column indexes or columnnames.

HTH


Used data:

df <- structure(list(id = c(3L, 5L, 7L, 10L, 12L), 
                     groep_MNC = c(1L, 1L, 1L, 1L, 1L),
                     zkhs = c(1L, 1L, 1L, 1L, 1L),
                     fbeh = c(1L, 1L, 1L, 1L, 2L),
                     pgebdat = c("1955-12-01", "1943-04-09", "1958-04-10", "1958-04-17", "1947-11-01"),
                     p_age = c(42.5, 55.16667, 40.25, 40.25, 50.66667),
                     pgesl = c(1L, 1L, 1L, 1L, 1L),
                     prgebdat = c("<NA>", "1962-04-18", "<NA>", "1957-07-31", "1944-06-08"),
                     pr_age = c(NA, 36.5, NA, 41.33333, 54.58333),
                     prgesl = c(2L, 1L, 2L, 2L, 2L),
                     relpnst = c(1L, 2L, 1L, 1L, 1L),
                     pbb1 = c(5L, 2L, 5L, 5L, 5L),
                     pbb2 = c(5L, 1L, 3L, 3L, 5L)), 
                .Names = c("id", "groep_MNC", "zkhs", "fbeh", "pgebdat", "p_age", "pgesl", "prgebdat", "pr_age", "prgesl", "relpnst", "pbb1", "pbb2"),
                class = "data.frame", row.names = c("1", "2", "3", "4", "5"))
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!