问题
I have a dataframe with two columns, low and high. I would like to create a new variable that is a randomly selected value between low and high (inclusive and equal probability) using dplyr. I have tried
library(tidyverse)
data_frame(low = 1:10, high = 11) %>%
mutate(rand_btwn = base::sample(seq(low, high, by = 1), size = 1))
which gives me an error since seq
expects scalar arguments.
I then tried again using a vectorized version of seq
seq2 <- Vectorize(seq.default, vectorize.args = c("from", "to"))
data_frame(low = 1:10, high = 11) %>%
mutate(rand_btwn = base::sample(seq2(low, high, by = 1), size = 1))
but this does not give me the desired result either.
回答1:
To avoid the rowwise()
pattern, I usually prefer to map()
in mutate()
, like:
set.seed(123)
data_frame(low = 1:10, high = 11) %>%
mutate(rand_btwn = map_int(map2(low, high, seq), sample, size = 1))
# # A tibble: 10 x 3
# low high rand_btwn
# <int> <dbl> <int>
# 1 1 11 4
# 2 2 11 9
# 3 3 11 6
# 4 4 11 11
# 5 5 11 11
# 6 6 11 6
# 7 7 11 9
# 8 8 11 11
# 9 9 11 10
# 10 10 11 10
or:
set.seed(123)
data_frame(low = 1:10, high = 11) %>%
mutate(rand_btwn = map2_int(low, high, ~ sample(seq(.x, .y), 1)))
Your Vectorize()
approach also works:
sample_v <- Vectorize(function(x, y) sample(seq(x, y), 1))
set.seed(123)
data_frame(low = 1:10, high = 11) %>%
mutate(rand_btwn = sample_v(low, high))
来源:https://stackoverflow.com/questions/47519201/how-to-use-sample-and-seq-in-a-dplyr-pipline