发表新帖

发表新帖

Tidying datasets with multiple sections/headers at variable positions

前端未结

关注

 4  1669

轻奢々 2021-01-22 16:33

Context

I am trying to read in and tidy an excel file with multiple headers/sections placed at variable positions. The content of these headers need to

4条回答

轻奢々 (楼主)

2021-01-22 17:15
A data.table option.

Similar to @camille's answer, I assume you can make some vector of measures and if the col1 value isn't in that list it's a city. This groups by the cumsum of not (!) col1 %in% meas, i.e. a group number which increments by 1 each time col1 is not found in meas. Within each group, city is set as the first value of col1 and col1/col2 are renamed appropriately. Then I filter to only rows where city doesn't equal col1 (now renamed type) and remove the grouping variable g.
```
library(data.table)
setDT(df)

meas <- c("Diesel", "Gasoline", "LPG", "Electric")

df[, .(city = first(col1), type = col1, value = col2), 
   by = .(g = cumsum(!col1 %in% meas))
  ][city != type, -'g']

#       city     type value
# 1: Seattle   Diesel    80
# 2: Seattle Gasoline    NA
# 3: Seattle      LPG    10
# 4: Seattle Electric    10
# 5:  Boston   Diesel    65
# 6:  Boston Gasoline    25
# 7:  Boston Electric    10
```
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题