Context
I am trying to read in and tidy an excel file with multiple headers/sections placed at variable positions. The content of these headers need to
Here is an option based on creating a group based on the us.cities dataset from maps by matching the elements in 'city' with the 'name' column from 'us.cities' to create a group, and then create the first element of 'col1' as 'city', delete the first row (slice(-1))
library(maps)
library(dplyr)
library(stringr)
df %>%
group_by(grp = cumsum(str_detect(col1,str_c("\\b(",
str_c(word(us.cities$name, 1), collapse="|"), ")\\b")))) %>%
mutate(city = first(col1)) %>%
slice(-1) %>%
ungroup %>%
select(city, type = col1, value = col2)
# A tibble: 7 x 3
# city type value
#
#1 Seattle Diesel 80
#2 Seattle Gasoline NA
#3 Seattle LPG 10
#4 Seattle Electric 10
#5 Boston Diesel 65
#6 Boston Gasoline 25
#7 Boston Electric 10
Or another option is using str_extract instead of grouping and then fill as in the other post
df %>%
mutate(city = str_extract(col1, str_c("\\b(",
str_c(word(us.cities$name, 1), collapse="|"), ")\\b"))) %>%
fill(city) %>%
filter(col1 != city) %>%
select(city, type = col1, value = col2)
NOTE: This would also work if there are 100s of other elements in 'col1' besides the 'city'. Here, we considered only the US cities, if it also includes cities from other countries, use world.cities data from the same package