Categorising numerical and categorical variables into appropriate ranges in R

梦想与她 提交于 2019-12-12 04:16:10

问题


 Df <- bball5
 str(bball5)
 'data.frame':  379 obs. of 9 variables:
 $ ID         : int  238 239 240 241 242 243 244 245 246 247 ...
 $ Sex        : Factor w/ 2 levels "female","male": 1 1 1 1 1 1 1 1 1 1 ...
 $ Sport      : Factor w/ 10 levels "BBall","Field",..: 1 1 1 1 1 1 1 1 1 1 
 $ Ht         : num  196 190 178 185 185 ...
 $ Wt         : num  78.9 74.4 69.1 74.9 64.6 63.7 75.2 62.3 66.5 62.9 ...
 $ BMI        : num  20.6 20.7 21.9 21.9 19 ...
 $ BMIc       : NA NA NA NA NA NA NA NA NA NA ...
 $ Sex_f      : Factor w/ 1 level "female": 1 1 1 1 1 1 1 1 1 1 ...
 $ Sex_m      : Factor w/ 1 level "male": NA NA NA NA NA NA NA NA NA NA ...

I would like to class a set of numerical variables within a large dataset of a 1000.

I need to classify BMI into the following ranges:

    (<18.50, 18.50-24.99, 24.99-25.00, >=30.00) 

and label them respectively as:

  "Underweight" "Normal" "Overweight" "Obese" 

So as to plot tables to demonstrate relationships that are the separate for:
$ males $ females
according to sport types.

I also need to confirm that the BMI calculated is correctly done, as I am finding it difficult to create formula within the dataset for a new variable column

$ BMIc.

There are several missing values in variables (NA),within each variable, which are giving me errors if I create a function to calculate the a new variable

 bball5$BMIc <- bball5$BMI[bball5$BMI, c(bball5$wt/(bball5$Ht)^2 ]

I am unable to class the BMI variables. I must maintain the ID to match as well.


回答1:


You can create a variable named BMIclass and do this to create the 4 categories in it:

bball5$BMIclass <- "Underweight"
bball5[which(bball5$BMI>18.5 & ball5$BMI<24.99), 'BMIclass'] <- "Normal"
bball5[which(bball5$BMI>=24.99 & ball5$BMI<25), 'BMIclass'] <- "Overweight"
bball5[which(bball5$BMI>=30), 'BMIclass'] <- "Obese"
bball5$BMIclass <- as.factor(bball5$BMIc)

As for BMIc you can do this (below). It will still create some NAs where there are missing values but it will give you the correct BMIc where there is data for it.

bball5$BMIc <- bball5$wt/bball5$Ht^2



回答2:


I would use cut to transform BMI into a categorical variable. An example on a random BMI vector:

BMI <- runif(100, 16, 35)
BMIc <- cut(BMI, breaks=c(0, 18.5, 25, 30, +Inf), 
            labels=c("Underweight", "Normal", "Overweight", "Obese"))

To check the result, you could use aggregate:

aggregate(BMI, by=list(BMIc), summary)

Finally, the new vector can be included in the data frame with the command df$BMIc <- BMIc for example...



来源:https://stackoverflow.com/questions/36358322/categorising-numerical-and-categorical-variables-into-appropriate-ranges-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!