问题
As usual, I got some SPSS file that I've imported into R with spss.get
function from Hmisc
package. I'm bothered with labelled
class that Hmisc::spss.get
adds to all variables in data.frame
, hence want to remove it.
labelled
class gives me headaches when I try to run ggplot
or even when I want to do some menial analysis! One solution would be to remove labelled
class from each variable in data.frame
. How can I do that? Is that possible at all? If not, what are my other options?
I really want to bypass reediting variables "from scratch" with as.data.frame(lapply(x, as.numeric))
and as.character
where applicable... And I certainly don't want to run SPSS and remove labels manually (don't like SPSS, nor care to install it)!
Thanks!
回答1:
You can avoid creating "labelled" variables in spss.get with the argument: , use.value.labels=FALSE.
w <- spss.get('/tmp/my.sav', use.value.labels=FALSE, datevars=c('birthdate','deathdate'))
The code from Bhattacharya could fail if the class of the labelled vector were simply "labelled" rather than c("labelled", "factor") in which case it should have been:
class(x[[i]]) <- NULL # no error from assignment of empty vector
The error you report can be reproduced with this code:
> b <- 4:6
> label(b) <- 'B Label'
> str(b)
Class 'labelled' atomic [1:3] 4 5 6
..- attr(*, "label")= chr "B Label"
> class(b) <- class(b)[-1]
Error in class(b) <- class(b)[-1] :
invalid replacement object to be a class string
回答2:
Here's how I get rid of the labels altogether. Similar to Jyotirmoy's solution but works for a vector as well as a data.frame. (Partial credits to Frank Harrell)
clear.labels <- function(x) {
if(is.list(x)) {
for(i in 1 : length(x)) class(x[[i]]) <- setdiff(class(x[[i]]), 'labelled')
for(i in 1 : length(x)) attr(x[[i]],"label") <- NULL
}
else {
class(x) <- setdiff(class(x), "labelled")
attr(x, "label") <- NULL
}
return(x)
}
Use as follows:
my.unlabelled.df <- clear.labels(my.labelled.df)
回答3:
You can try out the read.spss
function from the foreign
package.
A rough and ready way to get rid of the labelled
class created by spss.get
for (i in 1:ncol(x)) {
z<-class(x[[i]])
if (z[[1]]=='labelled'){
class(x[[i]])<-z[-1]
attr(x[[i]],'label')<-NULL
}
}
But can you please give an example where labelled
causes problems?
If I have a variable MAED
in a data frame x
created by spss.get
, I have:
> class(x$MAED)
[1] "labelled" "factor"
> is.factor(x$MAED)
[1] TRUE
So well-written code that expects a factor (say) should not have any problems.
回答4:
Well, I figured out that unclass
function can be utilized to remove classes (who would tell, aye?!):
library(Hmisc)
# let's presuppose that variable x is gathered through spss.get() function
# and that x is factor
> class(x)
[1] "labelled" "factor"
> foo <- unclass(x)
> class(foo)
[1] "integer"
It's not the luckiest solution, just imagine back-converting bunch of vectors... If anyone tops this, I'll check it as an answer...
回答5:
Suppose:
library(Hmisc)
w <- spss.get('...')
You could remove the labels of a variable called "var1" by using:
attributes(w$var1)$label <- NULL
If you also want to remove the class "labbled", you could do:
class(w$var1) <- NULL
or if the variable has more than one class:
class(w$var1) <- class(w$var1)[-which(class(w$var1)=="labelled")]
Hope this helps!
来源:https://stackoverflow.com/questions/2394902/remove-variable-labels-attached-with-foreign-hmisc-spss-import-functions