How do I subset/split this table bases on the values of one column in R?

问题

The data can be found here: https://www.dropbox.com/s/l7pc11hhiwr8zzn/data.csv?dl=0 , or else as nlschools in the library MASS.

I'd like to split this table based on the value of nlschools$SES, having the table divide into tables where nlschools$SES is <=30, 30 < SES <= 40 and > 40, and with all the columns remaining.

I have tried using cut with intervals like 0:30, but the result is very confusing and does not have the complete set of columns remaining.

I hope what I'm trying to achieve is described clear enough.

回答1:

Try:

one<-subset(nlschools, nlschools$SES <=30)
two<-subset(nlschools, nlschools$SES >30 &  nlschools$SES<=40)
three<-subset(nlschools, nlschools$SES >40)

回答2:

Try this:

indx <- with(nlschools,cut(SES, c(-Inf, 30, 40, Inf)))
lst <- split(nlschools, indx)

lapply(lst, head,2)
#$`(-Inf,30]`
#  lang   IQ class GS SES COMB
#1   46 15.0   180 29  23    0
#2   45 14.5   180 29  10    0

#$`(30,40]`
#  lang   IQ class GS SES COMB
#37   39 11.0  1082 25  33    1
#39   43 10.5  1280 31  33    1

#$`(40, Inf]`
#  lang IQ class GS SES COMB
#49   31  9  1280 31  50    1
#71   45 15  1880 28  50    0

If you need it as separate datasets:

list2env(setNames(lst, c("sesLOW", "sesMED", "sesHIGH")), envir=.GlobalEnv)
# <environment: R_GlobalEnv>


head(sesLOW,3)
#  lang   IQ class GS SES COMB.
#1   46 15.0   180 29  23    0
#2   45 14.5   180 29  10    0
#3   33  9.5   180 29  15    0

Checking the results with @Ujjwal's post

identical(sesLOW, one)
#[1] TRUE

identical(sesMED, two)
#[1] TRUE

identical(sesHIGH, three)
#[1] TRUE

However, it would be much easier to do all the analysis/calculations within the list rather than as separate datasets. Even you can save the list elements separately using lapply and write.table/write.csv etc

Update

If you want to create a new column within the list

names(lst) <- c("low","med", "high")#no need to rename the list elements though. You can directly use it as a vector in the `Map`
lst2 <- Map(function(x, y) {x[,"SEScat"] <- y;x }, lst, names(lst))
lapply(lst2, head,2)
#$low
#  lang   IQ class GS SES COMB SEScat
#1   46 15.0   180 29  23    0    low
#2   45 14.5   180 29  10    0    low

#$med
#  lang   IQ class GS SES COMB SEScat
#37   39 11.0  1082 25  33    1    med
#39   43 10.5  1280 31  33    1    med

#$high
#  lang IQ class GS SES COMB SEScat
#49   31  9  1280 31  50    1   high
#71   45 15  1880 28  50    0   high

回答3:

In response to your comment to @akrun, try:

> ddf$SEScat = with(ddf, ifelse(SES<=30,'low', ifelse(SES<=40, 'med', 'high')))
> ll = split(ddf, ddf$SEScat)

> head(ll[[1]])
      X lang   IQ class GS SES COMB SEScat
49   49   31  9.0  1280 31  50    1   high
71   71   45 15.0  1880 28  50    0   high
82   82   47 12.0  1880 28  50    0   high
85   85   33 13.0  1880 28  50    0   high
90   90   31 10.5  1880 28  50    0   high
145 145   50 13.5  2680 21  45    0   high
> head(ll[[2]])
  X lang   IQ class GS SES COMB SEScat
1 1   46 15.0   180 29  23    0    low
2 2   45 14.5   180 29  10    0    low
3 3   33  9.5   180 29  15    0    low
4 4   46 11.0   180 29  23    0    low
5 5   20  8.0   180 29  10    0    low
6 6   30  9.5   180 29  10    0    low
> head(ll[[3]])
    X lang   IQ class GS SES COMB SEScat
37 37   39 11.0  1082 25  33    1    med
39 39   43 10.5  1280 31  33    1    med
40 40   25  8.5  1280 31  33    1    med
42 42   41 11.0  1280 31  37    1    med
45 45   21  9.5  1280 31  40    1    med
52 52   29  8.5  1280 31  40    1    med

来源：https://stackoverflow.com/questions/26235205/how-do-i-subset-split-this-table-bases-on-the-values-of-one-column-in-r

标签

statistics