applying a function to combinations of groups, holding 1 group fixed

问题

I have some data which looks like:

   grp    date                id              Y
   <chr>  <dttm>              <chr>       <dbl>
 1 group1 2020-09-01 00:00:00 04003      17039.
 2 group1 2020-09-01 00:00:00 04006      13233.
 3 group1 2020-09-01 00:00:00 04011_AM    7918.
 4 group1 2020-09-01 00:00:00 0401301_AD 22586.
 5 group1 2020-09-01 00:00:00 0401303    20527.
 6 group1 2020-09-01 00:00:00 0401305    29422.
 7 group2 2020-09-01 00:00:00 22017_AM    7088.
 8 group2 2020-09-01 00:00:00 22021_AM    8134.
 9 group2 2020-09-01 00:00:00 22039_AM   15842.
10 group2 2020-09-01 00:00:00 22048      16142.

Which has different groups. I also have a function:

normaliseData <-function(m){
  (m - min(m)) / (max(m) - min(m))
}

I want to normalise the groups by the min and max of the pairwise values, holding group1 fixed. That is, I want to normalise the data fixing group1 so it will have the following combinations.

group1 & group2
group1 & group3
group1 & group4

Data:

data <- structure(list(grp = c("group1", "group1", "group1", "group1", 
"group1", "group1", "group2", "group2", "group2", "group2", "group2", 
"group2", "group3", "group3", "group3", "group3", "group3", "group3", 
"group4", "group4", "group4", "group4", "group4", "group4"), 
    date = structure(c(1598918400, 1598918400, 1598918400, 1598918400, 
    1598918400, 1598918400, 1598918400, 1598918400, 1598918400, 
    1598918400, 1598918400, 1598918400, 1598918400, 1598918400, 
    1598918400, 1598918400, 1598918400, 1598918400, 1598918400, 
    1598918400, 1598918400, 1598918400, 1598918400, 1598918400
    ), tzone = "UTC", class = c("POSIXct", "POSIXt")), id = c("04003", 
    "04006", "04011_AM", "0401301_AD", "0401303", "0401305", 
    "22017_AM", "22021_AM", "22039_AM", "22048", "22053_AM", 
    "22054_AM", "28002", "28004", "2800501", "2800502", "2800503", 
    "2800504", "31010_AM", "31015_AM", "31016", "31019_AM", "31023", 
    "31029_AM"), Y = c(17039.329, 13232.982, 7917.693, 22585.676, 
    20527.113, 29422.471, 7087.536, 8134.265, 15842.035, 16142.111, 
    11493.981, 6556.387, 22086.768, 11325.882, 53449.067, 83662.101, 
    78508.089, 66107.125, 5095.169, 5590.531, 17796.439, 6028.701, 
    39271.698, 3642.281)), row.names = c(NA, -24L), groups = structure(list(
    grp = c("group1", "group2", "group3", "group4"), .rows = structure(list(
        1:6, 7:12, 13:18, 19:24), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, 4L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

EDIT:

I am looking to apply the following:

#Min / max from group1 and group2
    data %>% 
      filter(grp == "group1" | grp == "group2") %>% 
      mutate(
        normedOut = normaliseData(Y)
      )

#Min / max from group1 and group3
    data %>% 
      filter(grp == "group1" | grp == "group3") %>% 
      mutate(
        normedOut = normaliseData(Y)
      )

#Min / max from group1 and group4
    data %>% 
      filter(grp == "group1" | grp == "group4") %>% 
      mutate(
        normedOut = normaliseData(Y)
      )

回答1:

Here is one option with purrr based on what I understand from your question. We create a vector, groups, that contains the groups we are interested in looping over for our three pairs holding group1 fixed. We use your desired filter and mutate sequence and then create columns named for each group in our groups vector that contains the normalized data. This will result in a dataframe that contains 3 new columns, each column representing the normalized Y between group 1 and another group. NAs will populate where there is no pair (e.g. between group2 and group3)

groups <- c("group2", "group3", "group4")
groups %>%
  purrr::map_dfr(~ data %>%
        filter(grp == "group1" | grp == .x) %>%
        mutate(!!.x := normaliseData(Y)))

来源：https://stackoverflow.com/questions/65565164/applying-a-function-to-combinations-of-groups-holding-1-group-fixed

标签

dplyr