Tidying datasets with multiple sections/headers at variable positions

前端 未结 4 1655
轻奢々
轻奢々 2021-01-22 16:33

Context

I am trying to read in and tidy an excel file with multiple headers/sections placed at variable positions. The content of these headers need to

4条回答
  •  轻奢々
    轻奢々 (楼主)
    2021-01-22 17:15

    A data.table option.

    Similar to @camille's answer, I assume you can make some vector of measures and if the col1 value isn't in that list it's a city. This groups by the cumsum of not (!) col1 %in% meas, i.e. a group number which increments by 1 each time col1 is not found in meas. Within each group, city is set as the first value of col1 and col1/col2 are renamed appropriately. Then I filter to only rows where city doesn't equal col1 (now renamed type) and remove the grouping variable g.

    library(data.table)
    setDT(df)
    
    meas <- c("Diesel", "Gasoline", "LPG", "Electric")
    
    df[, .(city = first(col1), type = col1, value = col2), 
       by = .(g = cumsum(!col1 %in% meas))
      ][city != type, -'g']
    
    #       city     type value
    # 1: Seattle   Diesel    80
    # 2: Seattle Gasoline    NA
    # 3: Seattle      LPG    10
    # 4: Seattle Electric    10
    # 5:  Boston   Diesel    65
    # 6:  Boston Gasoline    25
    # 7:  Boston Electric    10
    

提交回复
热议问题