How can I take multiple vectors and recode their datatypes in R?

痴心易碎 提交于 2019-12-22 06:25:17

问题


I'm looking for an elegant way to change multiple vectors' datatypes in R.

I'm working with an educational dataset: 426 students' answers to eight multiple choice questions (1 = correct, 0 = incorrect), plus a column indicating which instructor (1, 2, or 3) taught their course.

As it stands, my data is sitting pretty in data.df, like this:

    str(data.df)
    'data.frame': 426 obs. of  9 variables:
    $ ques01: int  1 1 1 1 1 1 0 0 0 1 ...
    $ ques02: int  0 0 1 1 1 1 1 1 1 1 ...
    $ ques03: int  0 0 1 1 0 0 1 1 0 1 ...
    $ ques04: int  1 0 1 1 1 1 1 1 1 1 ...
    $ ques05: int  0 0 0 0 1 0 0 0 0 0 ...
    $ ques06: int  1 0 1 1 0 1 1 1 1 1 ...
    $ ques07: int  0 0 1 1 0 1 1 0 0 1 ...
    $ ques08: int  0 0 1 1 1 0 1 1 0 1 ...
    $ inst  : num  1 1 1 1 1 1 1 1 1 1 ...

But those ques0x values aren't really integers. Rather, I think it's better to have R treat them as experimental factors. Same goes for the "inst" values.

I'd love to turn all those ints and nums into factors

Ideally, an elegant solution should produce a dataframe—I call it factorData.df—that looks like this:

    str(factorData.df)
    'data.frame': 426 obs. of  9 variables:
    $ ques01: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 1 1 1 2 ...
    $ ques02: Factor w/ 2 levels "0","1": 1 1 2 2 2 2 2 2 2 2 ...
    $ ques03: Factor w/ 2 levels "0","1": 1 1 2 2 1 1 2 2 1 2 ...
    $ ques04: Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ...
    $ ques05: Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 1 1 1 ...
    $ ques06: Factor w/ 2 levels "0","1": 2 1 2 2 1 2 2 2 2 2 ...
    $ ques07: Factor w/ 2 levels "0","1": 1 1 2 2 1 2 2 1 1 2 ...
    $ ques08: Factor w/ 2 levels "0","1": 1 1 2 2 2 1 2 2 1 2 ...
    $ inst  : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...

I'm fairly certain that whatever solution you folks come up with, it ought to be easy to generalize to any n number of variables that'd need to get reclassified, and would work across most common conversions (int -> factor and num -> int, for example).

No matter what solution you folks generate, it's bound to be more elegant than mine

Because my current clunky code is just 9 separate factor() statements, one for each variable, like this

    factorData.df$ques01 

I'm brand-new to R, programming, and stackoverflow. Please be gentle, and thanks in advance for your help!


回答1:


This was also answered in R-Help.

I imagine that there's a better way to do it, but here are two options:

# use a sample data set
> str(cars)
'data.frame':   50 obs. of  2 variables:
 $ speed: num  4 4 7 7 8 9 10 10 10 11 ...
 $ dist : num  2 10 4 22 16 10 18 26 34 17 ...
> data.df <- cars 

You can use lapply:

> data.df <- data.frame(lapply(data.df, factor))

Or a for statement:

> for(i in 1:ncol(data.df)) data.df[,i] <- as.factor(data.df[,i])

In either case, you end up with what you want:

> str(data.df)
'data.frame':   50 obs. of  2 variables:
 $ speed: Factor w/ 19 levels "4","7","8","9",..: 1 1 2 2 3 4 5 5 5 6 ...
 $ dist : Factor w/ 35 levels "2","4","10","14",..: 1 3 2 9 5 3 7 11 14 6 ...



回答2:


I found an alternative solution in the plyr package:

# load the package and data
> library(plyr)
> data.df <- cars

Use the colwise function:

> data.df <- colwise(factor)(data.df)
> str(data.df)
'data.frame':   50 obs. of  2 variables:
 $ speed: Factor w/ 19 levels "4","7","8","9",..: 1 1 2 2 3 4 5 5 5 6 ...
 $ dist : Factor w/ 35 levels "2","4","10","14",..: 1 3 2 9 5 3 7 11 14 6 ...

Incidentally, if you look inside the colwise function, it just uses lapply:

df <- as.data.frame(lapply(filtered, .fun, ...))


来源:https://stackoverflow.com/questions/1489199/how-can-i-take-multiple-vectors-and-recode-their-datatypes-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!