问题
I have some data which looks like:
grp date id Y
<chr> <dttm> <chr> <dbl>
1 group1 2020-09-01 00:00:00 04003 17039.
2 group1 2020-09-01 00:00:00 04006 13233.
3 group1 2020-09-01 00:00:00 04011_AM 7918.
4 group1 2020-09-01 00:00:00 0401301_AD 22586.
5 group1 2020-09-01 00:00:00 0401303 20527.
6 group1 2020-09-01 00:00:00 0401305 29422.
7 group2 2020-09-01 00:00:00 22017_AM 7088.
8 group2 2020-09-01 00:00:00 22021_AM 8134.
9 group2 2020-09-01 00:00:00 22039_AM 15842.
10 group2 2020-09-01 00:00:00 22048 16142.
Which has different groups. I also have a function:
normaliseData <-function(m){
(m - min(m)) / (max(m) - min(m))
}
I want to normalise the groups by the min and max of the pairwise values, holding group1
fixed. That is, I want to normalise the data fixing group1
so it will have the following combinations.
group1
&group2
group1
&group3
group1
&group4
Data:
data <- structure(list(grp = c("group1", "group1", "group1", "group1",
"group1", "group1", "group2", "group2", "group2", "group2", "group2",
"group2", "group3", "group3", "group3", "group3", "group3", "group3",
"group4", "group4", "group4", "group4", "group4", "group4"),
date = structure(c(1598918400, 1598918400, 1598918400, 1598918400,
1598918400, 1598918400, 1598918400, 1598918400, 1598918400,
1598918400, 1598918400, 1598918400, 1598918400, 1598918400,
1598918400, 1598918400, 1598918400, 1598918400, 1598918400,
1598918400, 1598918400, 1598918400, 1598918400, 1598918400
), tzone = "UTC", class = c("POSIXct", "POSIXt")), id = c("04003",
"04006", "04011_AM", "0401301_AD", "0401303", "0401305",
"22017_AM", "22021_AM", "22039_AM", "22048", "22053_AM",
"22054_AM", "28002", "28004", "2800501", "2800502", "2800503",
"2800504", "31010_AM", "31015_AM", "31016", "31019_AM", "31023",
"31029_AM"), Y = c(17039.329, 13232.982, 7917.693, 22585.676,
20527.113, 29422.471, 7087.536, 8134.265, 15842.035, 16142.111,
11493.981, 6556.387, 22086.768, 11325.882, 53449.067, 83662.101,
78508.089, 66107.125, 5095.169, 5590.531, 17796.439, 6028.701,
39271.698, 3642.281)), row.names = c(NA, -24L), groups = structure(list(
grp = c("group1", "group2", "group3", "group4"), .rows = structure(list(
1:6, 7:12, 13:18, 19:24), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, 4L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
EDIT:
I am looking to apply the following:
#Min / max from group1 and group2
data %>%
filter(grp == "group1" | grp == "group2") %>%
mutate(
normedOut = normaliseData(Y)
)
#Min / max from group1 and group3
data %>%
filter(grp == "group1" | grp == "group3") %>%
mutate(
normedOut = normaliseData(Y)
)
#Min / max from group1 and group4
data %>%
filter(grp == "group1" | grp == "group4") %>%
mutate(
normedOut = normaliseData(Y)
)
回答1:
Here is one option with purrr
based on what I understand from your question. We create a vector, groups
, that contains the groups we are interested in looping over for our three pairs holding group1 fixed. We use your desired filter and mutate sequence and then create columns named for each group in our groups
vector that contains the normalized data. This will result in a dataframe that contains 3 new columns, each column representing the normalized Y between group 1 and another group. NAs will populate where there is no pair (e.g. between group2 and group3)
groups <- c("group2", "group3", "group4")
groups %>%
purrr::map_dfr(~ data %>%
filter(grp == "group1" | grp == .x) %>%
mutate(!!.x := normaliseData(Y)))
来源:https://stackoverflow.com/questions/65565164/applying-a-function-to-combinations-of-groups-holding-1-group-fixed