I have a data set like this:
Users Age
1 2
2 7
3 10
4 3
5 8
6 20
How do I split this data set into 3 data sets where the first consists of all users with ages between 0–5, second is 6–10 and third is 11–15?
You can combine split
with cut
to do this in a single line of code, avoiding the need to subset with a bunch of different expressions for different data ranges:
split(dat, cut(dat$Age, c(0, 5, 10, 15), include.lowest=TRUE))
# $`[0,5]`
# Users Age
# 1 1 2
# 4 4 3
#
# $`(5,10]`
# Users Age
# 2 2 7
# 3 3 10
# 5 5 8
#
# $`(10,15]`
# [1] Users Age
# <0 rows> (or 0-length row.names)
cut
splits up data based on the specified break points, and split
splits up a data frame based on the provided categories. If you stored the result of this computation into a list called l
, you could access the smaller data frames with l[[1]]
, l[[2]]
, and l[[3]]
or the more verbose:
l$`[0,5]`
l$`(5,10]`
l$`(10, 15]`
First, here's your dataset for my purposes: foo=data.frame(Users=1:6,Age=c(2,7,10,3,8,20))
Here's your first dataset with ages 0–5: subset(foo,Age<=5&Age>=0)
Users Age
1 1 2
4 4 3
Here's your second with ages 6–10: subset(foo,Age<=10&Age>=6)
Users Age
2 2 7
3 3 10
5 5 8
Your third (using subset(foo,Age<=15&Age>=11)
) is empty – your last Age
observation is over 15.
Note also that fractional ages between 5 and 6 or 10 and 11 (e.g., 5.1, 10.5) would be excluded, as this code matches your question very literally. If you'd want someone with an age less than 6 to go in the first group, just amend that code to subset(foo,Age<6&Age>=0)
. If you'd prefer a hypothetical person with Age=5.1
in the second group, that group's code would be subset(foo,Age<=10&Age>5)
.
来源:https://stackoverflow.com/questions/24707936/how-do-i-split-a-data-frame-based-on-range-of-column-values-in-r