问题
I have an excel file I need to read that has multiple observations and values per row, with complicated names. It looks something like this when you load in:
library(tidyverse)
library(janitor)
# An input table read from xlsx, with a format similar to this
# An input table read from xlsx, with a format similar to this
input_table <- tribble(~"product" ,
~"Price Store 1 ($1000/unit)",
~"Quantity in Store 1 (units)",
~"Price Store 2 ($1000/unit)",
~"Quantity in Store 2 (units)",
'product a', 10, 100, 20, 70,
'product b', 30, 10, 35, 10)
I want to use some form of gather/pivot_longer to make it tidy, and have an output that looks like this:
# Desired output
output_table <- tribble(~'product',~'store',~'price',~'quantity',
'product a', 1, 10, 100,
'product a', 2, 20, 70,
'product b', 1, 30, 10,
'product b', 2, 35, 10)
Is there an easy way to get there using pivot_longer
? Extracting the key number (in this case, store) would probably need some complex regex that I don't know how to create.
回答1:
Yes, we can do
tidyr::pivot_longer(input_table,
cols = -product,
names_to = c(".value", "Store"),
names_pattern = "(\\w+).*?(\\d)")
# product Store Price Quantity
# <chr> <chr> <dbl> <dbl>
#1 product a 1 10 100
#2 product a 2 20 70
#3 product b 1 30 10
#4 product b 2 35 10
We get the column names (Price
or Quantity
) along with store number using names_pattern
. The first word (\\w+
) is the column name whereas first digit coming after it (\\d
) is considered as store number.
回答2:
We can use the names_pattern
in pivot_longer
to match one or more letters followed by characters that are not a digit and capture the digit
library(tidyr)
pivot_longer(input_table, cols = -product,
names_to = c(".value", "Store"),
names_pattern = "([A-Za-z]+)[^0-9]+([0-9])")
# A tibble: 4 x 4
# product Store Price Quantity
# <chr> <chr> <dbl> <dbl>
#1 product a 1 10 100
#2 product a 2 20 70
#3 product b 1 30 10
#4 product b 2 35 10
来源:https://stackoverflow.com/questions/59277040/tidyr-pivot-longer-handling-multiple-observations-and-values-per-row