R code to categorize age into group/ bins/ breaks

僤鯓⒐⒋嵵緔 提交于 2019-12-17 06:11:56

问题


I am trying to categorize age into group so it will not be continuous. I have this code:

data$agegrp(data$age>=40 & data$age<=49) <- 3
data$agegrp(data$age>=30 & data$age<=39) <- 2
data$agegrp(data$age>=20 & data$age<=29) <- 1

the above code is not working under survival package. It's giving me:

invalid function in complex assignment

Can you point me where the error is? data is the dataframe I am using.


回答1:


I would use findInterval() here:

First, make up some sample data

set.seed(1)
ages <- floor(runif(20, min = 20, max = 50))
ages
# [1] 27 31 37 47 26 46 48 39 38 21 26 25 40 31 43 34 41 49 31 43

Use findInterval() to categorize your "ages" vector.

findInterval(ages, c(20, 30, 40))
# [1] 1 2 2 3 1 3 3 2 2 1 1 1 3 2 3 2 3 3 2 3

Alternatively, as recommended in the comments, cut() is also useful here:

cut(ages, breaks=c(20, 30, 40, 50), right = FALSE)
cut(ages, breaks=c(20, 30, 40, 50), right = FALSE, labels = FALSE)



回答2:


This answer provides two ways to solve the problem using the data.table package, which would greatly improve the speed of the process. This is crucial if one is working with large data sets.

1s Approach: an adaptation of the previous answer but now using data.table + including labels:

library(data.table)

agebreaks <- c(0,1,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,500)
agelabels <- c("0-1","1-4","5-9","10-14","15-19","20-24","25-29","30-34",
               "35-39","40-44","45-49","50-54","55-59","60-64","65-69",
               "70-74","75-79","80-84","85+")

setDT(data)[ , agegroups := cut(age, 
                                breaks = agebreaks, 
                                right = FALSE, 
                                labels = agelabels)]

2nd Approach: This is a more wordy method, but it also makes it more clear what exactly falls within each age group:

setDT(data)[age <1, agegroup := "0-1"]
data[age >0 & age <5, agegroup := "1-4"]
data[age >4 & age <10, agegroup := "5-9"]
data[age >9 & age <15, agegroup := "10-14"]
data[age >14 & age <20, agegroup := "15-19"]
data[age >19 & age <25, agegroup := "20-24"]
data[age >24 & age <30, agegroup := "25-29"]
data[age >29 & age <35, agegroup := "30-34"]
data[age >34 & age <40, agegroup := "35-39"]
data[age >39 & age <45, agegroup := "40-44"]
data[age >44 & age <50, agegroup := "45-49"]
data[age >49 & age <55, agegroup := "50-54"]
data[age >54 & age <60, agegroup := "55-59"]
data[age >59 & age <65, agegroup := "60-64"]
data[age >64 & age <70, agegroup := "65-69"]
data[age >69 & age <75, agegroup := "70-74"]
data[age >74 & age <80, agegroup := "75-79"]
data[age >79 & age <85, agegroup := "80-84"]
data[age >84, agegroup := "85+"]

Although the two approaches should give the same result, I prefer the 1st one for two reasons. (a) It is shorter to write and (2) the age groups are ordered in the correct way, which is crucial when it comes to visualizing the data.




回答3:


Let's say that your ages were stored in the dataframe column labeled age. Your dataframe is df, and you want a new column age_grouping containing the "bucket" that your ages fall in.

In this example, suppose that your ages ranged from 0 -> 100, and you wanted to group them every 10 years. The following code would accomplish this by storing these intervals in a new age grouping column:

df$age_grouping <- cut(df$age, c(0:100, 10))



回答4:


myData$age_grp <- myData$age
myData$age_grp <- ifelse((myData$age>=10 & myData$age<=18) , 'minnor',myData$age_grp)
myData$age_grp <- ifelse((myData$age>18 & myData$age<=21) , 'junior',myData$age_grp)
myData$age_grp <- ifelse((myData$age>21 & myData$age<=25) , 'major_1',myData$age_grp)
myData$age_grp <- ifelse((myData$age>25 & myData$age<=30) , 'major_2',myData$age_grp)
myData$age_grp <- ifelse((myData$age>30 & myData$age<=40) , 'major_3',myData$age_grp)
myData$age_grp <- ifelse((myData$age>40 & myData$age<=55) , 'major_4',myData$age_grp)
myData$age_grp <- ifelse((myData$age>55) , 'minnor',myData$age_grp)
myData$age_grp<-as.factor(myData$age_grp)
summary(myData$age_grp)
library(dplyr)
myData <- select(myData, -(age) )


来源:https://stackoverflow.com/questions/12979456/r-code-to-categorize-age-into-group-bins-breaks

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!