Add extra level to factors in dataframe

允我心安 提交于 2019-11-27 12:25:51

You could define a function that adds the levels to a factor, but just returns anything else:

addNoAnswer <- function(x){
  if(is.factor(x)) return(factor(x, levels=c(levels(x), "No Answer")))
  return(x)
}

Then you just lapply this function to your columns

df <- as.data.frame(lapply(df, addNoAnswer))

That should return what you want.

The levels function accept the levels(x) <- value call. Therefore, it's very easy to add different levels:

f1 <- factor(c("a", "a", NA, NA, "b", NA, "a", "c", "a", "c", "b"))
str(f1)
 Factor w/ 3 levels "a","b","c": 1 1 NA NA 2 NA 1 3 1 3 ...
levels(f1) <- c(levels(f1),"No Answer")
f1[is.na(f1)] <- "No Answer"
str(f1)
 Factor w/ 4 levels "a","b","c","No Answer": 1 1 4 4 2 4 1 3 1 3 ...

You can then loop it around all variables in a data.frame:

f1 <- factor(c("a", "a", NA, NA, "b", NA, "a", "c", "a", "c", "b"))
f2 <- factor(c("c", NA, "b", NA, "b", NA, "c" ,"a", "d", "a", "b"))
f3 <- factor(c(NA, "b", NA, "b", NA, NA, "c", NA, "d" , "e", "a"))
df1 <- data.frame(f1,n1=1:11,f2,f3)

str(df1)
  'data.frame':   11 obs. of  4 variables:
  $ f1: Factor w/ 3 levels "a","b","c": 1 1 NA NA 2 NA 1 3 1 3 ...
  $ n1: int  1 2 3 4 5 6 7 8 9 10 ...
  $ f2: Factor w/ 4 levels "a","b","c","d": 3 NA 2 NA 2 NA 3 1 4 1 ...
  $ f3: Factor w/ 5 levels "a","b","c","d",..: NA 2 NA 2 NA NA 3 NA 4 5 ...    

for(i in 1:ncol(df1)) if(is.factor(df1[,i])) levels(df1[,i]) <- c(levels(df1[,i]),"No Answer")
df1[is.na(df1)] <- "No Answer"

str(df1)
 'data.frame':   11 obs. of  4 variables:
  $ f1: Factor w/ 4 levels "a","b","c","No Answer": 1 1 4 4 2 4 1 3 1 3 ...
  $ n1: int  1 2 3 4 5 6 7 8 9 10 ...
  $ f2: Factor w/ 5 levels "a","b","c","d",..: 3 5 2 5 2 5 3 1 4 1 ...
  $ f3: Factor w/ 6 levels "a","b","c","d",..: 6 2 6 2 6 6 3 6 4 5 ...
Joe

Since this question was last answered this has become possible using fct_explicit_na() from the forcats package. I add here the example given in the documentation.

f1 <- factor(c("a", "a", NA, NA, "a", "b", NA, "c", "a", "c", "b"))
table(f1)

# f1
# a b c 
# 4 2 2 

f2 <- forcats::fct_explicit_na(f1)
table(f2)

# f2
#     a         b         c (Missing) 
#     4         2         2         3 

Default value is (Missing) but this can be changed via the na_level argument.

Expanding on ilir's answer and its comment, you can check if a column is a factor and that it does not already contain the new level, then add the level and thus make the function re-runable:

addLevel <- function(x, newlevel=NULL) {
  if(is.factor(x)) {
    if (is.na(match(newlevel, levels(x))))
      return(factor(x, levels=c(levels(x), newlevel)))
  }
  return(x)
}

You can then apply it like so:

dataFrame$column <- addLevel(dataFrame$column, "newLevel")

You need to convert the column to character, next add the new level based on the condition then at last convert column to factor.

Steps 1.First Convert Factor column to character:

        df$column2 <- as.character(column2)

2.Add the new level

        df[df$column1=="XYZ",]column2 <- "new_level"

3.Convert to factor again

        df$column2 <- as.factor(df$column2)

I have a very simple answer that may not directly address your specific scenario, but is a simple way to do this generally

levels(df$column) <- c(levels(df$column), newFactorLevel)

For factors, the levels are the numeric values assigned to each unique value of the factor variable. The advantage of using factors is that the categorical variables are better for visualizations. The original value of the factor variable is stored as a character even if it is a number. So to retrieve the original value use conversion as.character first - this will return the factor values, not the level numbers which starts with zero. Once you have value in character format use as.numeric to get the original numeric value.

df$factor_var has numeric values stored as characters

factor_var.values = as.numeric(as.character(df$factor_var))

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!