tidyr pivot_longer: handling multiple observations and values per row [duplicate]

江枫思渺然 提交于 2019-12-25 18:29:00

问题


I have an excel file I need to read that has multiple observations and values per row, with complicated names. It looks something like this when you load in:

library(tidyverse)
library(janitor)

# An input table read from xlsx, with a format similar to this
# An input table read from xlsx, with a format similar to this
input_table <- tribble(~"product" , 
                       ~"Price Store 1 ($1000/unit)",
                       ~"Quantity in Store 1 (units)",
                       ~"Price Store 2 ($1000/unit)",
                       ~"Quantity in Store 2 (units)",
                       'product a', 10, 100, 20, 70,
                       'product b', 30, 10, 35, 10)

I want to use some form of gather/pivot_longer to make it tidy, and have an output that looks like this:

# Desired output
output_table <- tribble(~'product',~'store',~'price',~'quantity',
                        'product a', 1, 10, 100,
                        'product a', 2, 20, 70,
                        'product b', 1, 30, 10,
                        'product b', 2, 35, 10)

Is there an easy way to get there using pivot_longer? Extracting the key number (in this case, store) would probably need some complex regex that I don't know how to create.


回答1:


Yes, we can do

tidyr::pivot_longer(input_table, 
                   cols = -product, 
                   names_to = c(".value", "Store"),
                   names_pattern =  "(\\w+).*?(\\d)")

#  product   Store Price Quantity
#  <chr>     <chr> <dbl>    <dbl>
#1 product a 1        10      100
#2 product a 2        20       70
#3 product b 1        30       10
#4 product b 2        35       10

We get the column names (Price or Quantity) along with store number using names_pattern. The first word (\\w+) is the column name whereas first digit coming after it (\\d) is considered as store number.




回答2:


We can use the names_pattern in pivot_longer to match one or more letters followed by characters that are not a digit and capture the digit

library(tidyr)
pivot_longer(input_table, cols = -product, 
               names_to = c(".value", "Store"),
                names_pattern =  "([A-Za-z]+)[^0-9]+([0-9])")
# A tibble: 4 x 4
#  product   Store Price Quantity
#  <chr>     <chr> <dbl>    <dbl>
#1 product a 1        10      100
#2 product a 2        20       70
#3 product b 1        30       10
#4 product b 2        35       10


来源:https://stackoverflow.com/questions/59277040/tidyr-pivot-longer-handling-multiple-observations-and-values-per-row

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!