问题
The data can be found here: https://www.dropbox.com/s/l7pc11hhiwr8zzn/data.csv?dl=0 , or else as nlschools in the library MASS.
I'd like to split this table based on the value of nlschools$SES, having the table divide into tables where nlschools$SES is <=30, 30 < SES <= 40 and > 40, and with all the columns remaining.
I have tried using cut with intervals like 0:30, but the result is very confusing and does not have the complete set of columns remaining.
I hope what I'm trying to achieve is described clear enough.
回答1:
Try:
one<-subset(nlschools, nlschools$SES <=30)
two<-subset(nlschools, nlschools$SES >30 & nlschools$SES<=40)
three<-subset(nlschools, nlschools$SES >40)
回答2:
Try this:
indx <- with(nlschools,cut(SES, c(-Inf, 30, 40, Inf)))
lst <- split(nlschools, indx)
lapply(lst, head,2)
#$`(-Inf,30]`
# lang IQ class GS SES COMB
#1 46 15.0 180 29 23 0
#2 45 14.5 180 29 10 0
#$`(30,40]`
# lang IQ class GS SES COMB
#37 39 11.0 1082 25 33 1
#39 43 10.5 1280 31 33 1
#$`(40, Inf]`
# lang IQ class GS SES COMB
#49 31 9 1280 31 50 1
#71 45 15 1880 28 50 0
If you need it as separate datasets:
list2env(setNames(lst, c("sesLOW", "sesMED", "sesHIGH")), envir=.GlobalEnv)
# <environment: R_GlobalEnv>
head(sesLOW,3)
# lang IQ class GS SES COMB.
#1 46 15.0 180 29 23 0
#2 45 14.5 180 29 10 0
#3 33 9.5 180 29 15 0
Checking the results with @Ujjwal's post
identical(sesLOW, one)
#[1] TRUE
identical(sesMED, two)
#[1] TRUE
identical(sesHIGH, three)
#[1] TRUE
However, it would be much easier to do all the analysis/calculations within the list rather than as separate datasets. Even you can save the list elements separately using lapply and write.table/write.csv etc
Update
If you want to create a new column within the list
names(lst) <- c("low","med", "high")#no need to rename the list elements though. You can directly use it as a vector in the `Map`
lst2 <- Map(function(x, y) {x[,"SEScat"] <- y;x }, lst, names(lst))
lapply(lst2, head,2)
#$low
# lang IQ class GS SES COMB SEScat
#1 46 15.0 180 29 23 0 low
#2 45 14.5 180 29 10 0 low
#$med
# lang IQ class GS SES COMB SEScat
#37 39 11.0 1082 25 33 1 med
#39 43 10.5 1280 31 33 1 med
#$high
# lang IQ class GS SES COMB SEScat
#49 31 9 1280 31 50 1 high
#71 45 15 1880 28 50 0 high
回答3:
In response to your comment to @akrun, try:
> ddf$SEScat = with(ddf, ifelse(SES<=30,'low', ifelse(SES<=40, 'med', 'high')))
> ll = split(ddf, ddf$SEScat)
> head(ll[[1]])
X lang IQ class GS SES COMB SEScat
49 49 31 9.0 1280 31 50 1 high
71 71 45 15.0 1880 28 50 0 high
82 82 47 12.0 1880 28 50 0 high
85 85 33 13.0 1880 28 50 0 high
90 90 31 10.5 1880 28 50 0 high
145 145 50 13.5 2680 21 45 0 high
> head(ll[[2]])
X lang IQ class GS SES COMB SEScat
1 1 46 15.0 180 29 23 0 low
2 2 45 14.5 180 29 10 0 low
3 3 33 9.5 180 29 15 0 low
4 4 46 11.0 180 29 23 0 low
5 5 20 8.0 180 29 10 0 low
6 6 30 9.5 180 29 10 0 low
> head(ll[[3]])
X lang IQ class GS SES COMB SEScat
37 37 39 11.0 1082 25 33 1 med
39 39 43 10.5 1280 31 33 1 med
40 40 25 8.5 1280 31 33 1 med
42 42 41 11.0 1280 31 37 1 med
45 45 21 9.5 1280 31 40 1 med
52 52 29 8.5 1280 31 40 1 med
来源:https://stackoverflow.com/questions/26235205/how-do-i-subset-split-this-table-bases-on-the-values-of-one-column-in-r