Recoding variables in R, seems to be my biggest headache. What functions, packages, processes do you use to ensure the best result?
I\'ve found very few useful exam
Recoding can mean a lot of things, and is fundamentally complicated.
Changing the levels of a factor can be done using the levels function:
> #change the levels of a factor
> levels(veteran$celltype) <- c("s","sc","a","l")
Transforming a continuous variable simply involves the application of a vectorized function:
> mtcars$mpg.log <- log(mtcars$mpg)
For binning continuous data look at cut and cut2 (in the hmisc package). For example:
> #make 4 groups with equal sample sizes
> mtcars[['mpg.tr']] <- cut2(mtcars[['mpg']], g=4)
> #make 4 groups with equal bin width
> mtcars[['mpg.tr2']] <- cut(mtcars[['mpg']],4, include.lowest=TRUE)
For recoding continuous or factor variables into a categorical variable there is recode in the car package and recode.variables in the Deducer package
> mtcars[c("mpg.tr2")] <- recode.variables(mtcars[c("mpg")] , "Lo:14 -> 'low';14:24 -> 'mid';else -> 'high';")
If you are looking for a GUI, Deducer implements recoding with the Transform and Recode dialogs:
http://www.deducer.org/pmwiki/pmwiki.php?n=Main.TransformVariables
http://www.deducer.org/pmwiki/pmwiki.php?n=Main.RecodeVariables