Gather multiple date/value columns using tidyr

前端 未结 5 999
慢半拍i
慢半拍i 2020-12-19 17:14

I have a data set containing (amongst others) multiple columns with dates and corresponding values (repeated measurements). Is there a way to turn this into a long data set

相关标签:
5条回答
  • 2020-12-19 17:21

    The same strategy but using tidyr instead looks as follows:

    df.value <- df %>%
        gather(key="foo", value="value", starts_with("value"))
    df.date <- df %>%
        gather(key="bar", value="date", starts_with("date"))
    

    After controlling the resulting dimensions (careful with NA values - there is also a na.rm argument to the gather function) I joined the data.frames using base/dplyr functions:

    df.long <- data.frame(select(df.value, id, age, value), select(df.date, date))
    

    I am certain there is a much more elgant way to both parts, but it did the trick.

    0 讨论(0)
  • 2020-12-19 17:21

    This does a reshape and then sorts the rows.

    The first two lines just set up the v.names and varying arguments to reshape. v.names defines the new column names and varying, is a list whose two components contain logical selection vectors of the date and value columns respectively.

    The last line of code does the sorting and can be omitted if the row order does not matter.

    No packages are used.

    v.names <- c("date", "value")
    varying <- lapply(v.names, startsWith, x = names(df))
    r <- reshape(df, dir = "long", varying = varying, v.names = v.names)
    r[order(r$id, r$time), ]
    

    giving the following where the id and time columns relate the output rows back to the input:

         id age time       date value
    1.1   1  12    1 2015-08-14     3
    1.2   1  12    2 2015-07-11    24
    1.3   1  12    3 2015-07-04     4
    2.1   2  92    1 2015-08-03    17
    2.2   2  92    2 2015-07-19    52
    2.3   2  92    3 2015-07-01    93
    3.1   3  28    1 2015-08-24    86
    3.2   3  28    2 2015-08-12    80
    3.3   3  28    3 2015-09-01    56
    4.1   4  45    1 2015-09-13    78
    4.2   4  45    2 2015-07-07    92
    4.3   4  45    3 2015-08-10    81
    5.1   5  25    1 2015-08-27    95
    5.2   5  25    2 2015-09-08    68
    5.3   5  25    3 2015-06-27    82
    6.1   6   1    1 2015-08-21    16
    6.2   6   1    2 2015-06-15    35
    6.3   6   1    3 2015-07-24    30
    7.1   7   7    1 2015-07-19    59
    7.2   7   7    2 2015-07-08    33
    7.3   7   7    3 2015-08-11    49
    8.1   8  71    1 2015-07-28    19
    8.2   8  71    2 2015-06-29    74
    8.3   8  71    3 2015-08-05    25
    9.1   9  59    1 2015-07-05    64
    9.2   9  59    2 2015-09-04    30
    9.3   9  59    3 2015-07-30    74
    10.1 10  96    1 2015-09-12    69
    10.2 10  96    2 2015-07-23    72
    10.3 10  96    3 2015-08-19    23
    
    0 讨论(0)
  • 2020-12-19 17:24

    There should be some efficient way, but this is one way.

    Working separately for date and value,

    #for date
    df.date<-df%>%select(id, age,date1,date2, date3)%>%melt(id.var=c("id", "age"), value.name="date")
    #for val
    df.val<-df%>%select(id, age,value1,value2, value3)%>%melt(id.var=c("id", "age"), value.name="value")
    

    Now join,

    df2<-full_join(df.date, df.val, by=c("id", "age"))
    df2%>%select(-variable.x, -variable.y)
    
     id age       date value
    1   1  40 2015-07-19    28
    2   1  40 2015-07-19    49
    3   1  40 2015-07-19    24
    4   2  33 2015-06-27    99
    5   2  33 2015-06-27    18
    6   2  33 2015-06-27    26
    7   3  75 2015-07-07    63
    8   3  75 2015-07-07    74
    9   3  75 2015-07-07    72
    
    0 讨论(0)
  • 2020-12-19 17:26

    I had the exact same question and data format for a dataset I was working on. Crowdsourced the answer at work. A couple of us came up with a single tidyr and dplyr pipeline solution. Using the same simulated df from original question.

    df %>%
        gather(key = date_position, value = date, starts_with("date")) %>%
        gather(key = value_position, value = value, starts_with("value")) %>%
        mutate(date_position = gsub('[^0-9]', "", date_position),
               value_position = gsub('[^0-9]', "", value_position)) %>%
        filter(date_position == value_position) %>%
        select(-ends_with("position")) %>%
        arrange(id)
    
    0 讨论(0)
  • 2020-12-19 17:39

    I stumbled across this trying to learn about using gather with a mix of dates and values.

    The existing answers lose information about which instance the date-value pair came from, ie, instance 1 for date1 & value1, etc. This may not be important, but here's a tidyverse option that keeps the instance.

    library(stringr) # not necessary but nice
    library(tidyr)
    library(dplyr)
    
    df %>% 
        gather(key, val, -id, -age) %>% 
        mutate(
            measure = str_sub(key,1,-2), 
            instance = str_sub(key, -1)
        ) %>% 
        select(-key) %>% 
        spread(measure, val) %>% 
        mutate(date = as.Date(date, origin="1970-01-01")) # restore date class
    
    0 讨论(0)
提交回复
热议问题