R: use of factor

后端 未结 3 1999
再見小時候
再見小時候 2020-12-24 04:35

I have some data:

transaction <- c(1,2,3);
date <- c(\"2010-01-31\",\"2010-02-28\",\"2010-03-31\");
type <- c(\"debit\", \"debit\", \"credit\");
amo         


        
3条回答
  •  独厮守ぢ
    2020-12-24 05:16

    Factors vs character vectors when doing stats: In terms of doing statistics, there's no difference in how R treats factors and character vectors. In fact, its often easier to leave factor variables as character vectors.

    If you do a regression or ANOVA with lm() with a character vector as a categorical variable you'll get normal model output but with the message:

    Warning message:
    In model.matrix.default(mt, mf, contrasts) :
      variable 'character_x' converted to a factor
    

    Factors vs character vectors when manipulating dataframes: When manipulating dataframes, however, character vectors and factors are treated very differently. Some information on the annoyances of R & factors can be found on the Quantum Forest blog, R pitfall #3: friggin’ factors.

    Its useful to use stringsAsFactors = FALSE when reading data in from a .csv or .txt using read.table or read.csv. As noted in another reply you have to make sure that everything in your character vector is consistent, or else every typo will be designated as a different factor. You can use the function gsub() to fix typos.

    Here is a worked example showing how lm() gives you the same results with a character vector and a factor.

    A random independent variable:

    continuous_x <- rnorm(10,10,3)
    

    A random categorical variable as a character vector:

    character_x  <- (rep(c("dog","cat"),5))
    

    Convert the character vector to a factor variable. factor_x <- as.factor(character_x)

    Give the two categories random values:

    character_x_value <- ifelse(character_x == "dog", 5*rnorm(1,0,1), rnorm(1,0,2))
    

    Create a random relationship between the indepdent variables and a dependent variable

    continuous_y <- continuous_x*10*rnorm(1,0) + character_x_value
    

    Compare the output of a linear model with the factor variable and the character vector. Note the warning that is given with the character vector.

    summary(lm(continuous_y ~ continuous_x + factor_x))
    summary(lm(continuous_y ~ continuous_x + character_x))
    

提交回复
热议问题