Convert data.frame column format from character to factor

北慕城南 提交于 2019-12-27 10:43:20

问题


I would like to change the format (class) of some columns of my data.frame object (mydf) from charactor to factor.

I don't want to do this when I'm reading the text file by read.table() function.

Any help would be appreciated.


回答1:


Hi welcome to the world of R.

mtcars  #look at this built in data set
str(mtcars) #allows you to see the classes of the variables (all numeric)

#one approach it to index with the $ sign and the as.factor function
mtcars$am <- as.factor(mtcars$am)
#another approach
mtcars[, 'cyl'] <- as.factor(mtcars[, 'cyl'])
str(mtcars)  # now look at the classes

This also works for character, dates, integers and other classes

Since you're new to R I'd suggest you have a look at these two websites:

R reference manuals: http://cran.r-project.org/manuals.html

R Reference card: http://cran.r-project.org/doc/contrib/Short-refcard.pdf




回答2:


# To do it for all names
df[] <- lapply( df, factor) # the "[]" keeps the dataframe structure
 col_names <- names(df)
# do do it for some names in a vector named 'col_names'
df[col_names] <- lapply(df[col_names] , factor)

Explanation. All dataframes are lists and the results of [ used with multiple valued arguments are likewise lists, so looping over lists is the task of lapply. The above assignment will create a set of lists that the function data.frame.[<- should successfully stick back into into the dataframe, df

Another strategy would be to convert only those columns where the number of unique items is less than some criterion, let's say fewer than the log of the number of rows as an example:

cols.to.factor <- sapply( df, function(col) length(unique(col)) < log10(length(col)) )
df[ cols.to.factor] <- lapply(df[ cols.to.factor] , factor)



回答3:


You could use dplyr::mutate_if() to convert all character columns or dplyr::mutate_at() for select named character columns to factors:

library(dplyr)

# all character columns to factor:
df <- mutate_if(df, is.character, as.factor)

# select character columns 'char1', 'char2', etc. to factor:
df <- mutate_at(df, vars(char1, char2), as.factor)



回答4:


If you want to change all character variables in your data.frame to factors after you've already loaded your data, you can do it like this, to a data.frame called dat:

character_vars <- lapply(dat, class) == "character"
dat[, character_vars] <- lapply(dat[, character_vars], as.factor)

This creates a vector identifying which columns are of class character, then applies as.factor to those columns.

Sample data:

dat <- data.frame(var1 = c("a", "b"),
                  var2 = c("hi", "low"),
                  var3 = c(0, 0.1),
                  stringsAsFactors = FALSE
                  )



回答5:


Another short way you could use is a pipe (%<>%) from the magrittr package. It converts the character column mycolumn to a factor.

library(magrittr)

mydf$mycolumn %<>% factor



回答6:


I've doing it with a function. In this case I will only transform character variables to factor:

for (i in 1:ncol(data)){
    if(is.character(data[,i])){
        data[,i]=factor(data[,i])
    }
}


来源:https://stackoverflow.com/questions/9251326/convert-data-frame-column-format-from-character-to-factor

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!