r-factor

Confusion between factor levels and factor labels

旧巷老猫 提交于 2019-11-26 21:17:34
There seems to be a difference between levels and labels of a factor in R. Up to now, I always thought that levels were the 'real' name of factor levels, and labels were the names used for output (such as tables and plots). Obviously, this is not the case, as the following example shows: df <- data.frame(v=c(1,2,3),f=c('a','b','c')) str(df) 'data.frame': 3 obs. of 2 variables: $ v: num 1 2 3 $ f: Factor w/ 3 levels "a","b","c": 1 2 3 df$f <- factor(df$f, levels=c('a','b','c'), labels=c('Treatment A: XYZ','Treatment B: YZX','Treatment C: ZYX')) levels(df$f) [1] "Treatment A: XYZ" "Treatment B:

Convert Factor to Date/Time in R

不问归期 提交于 2019-11-26 20:56:12
问题 This is the information contained within my dataframe: ## minuteofday: factor w/ 89501 levels "2013-06-01 08:07:00",... ## dDdt: num 7.8564 2.318 ... ## minutes: POSIXlt, format: NA NA NA I need to convert the minute of day column to a date/time format: minuteave$minutes <- as.POSIXlt(as.character(minuteave$minuteofday), format="%m/%d/%Y %H:%M:%S") I've tried as.POSIXlt , as.POSIXct and as.Date . None of which worked. Does anyone have ANY thoughts. The goal is to plot minutes vs. dDdt, but it

Replacing numbers within a range with a factor

这一生的挚爱 提交于 2019-11-26 19:11:54
Given a dataframe column which is a series of integers (age), I want to convert ranges of integers into ordinal variables. My current code doesn't work, how do I do this? df <- read.table("http://dl.dropbox.com/u/822467/df.csv", header = TRUE, sep = ",") df[(df >= 0) & (df <= 14)] <- "Age1" df[(df >= 15) & (df <= 44)] <- "Age2" df[(df >= 45) & (df <= 64)] <- "Age3" df[(df > 64)] <- "Age4" table(df) Use cut to do this in one step: dfc <- cut(df$x, breaks=c(0, 15, 45, 56, Inf)) str(dfc) Factor w/ 4 levels "(0,15]","(15,45]",..: 3 4 3 2 2 4 2 2 4 4 ... Once you are satisfied that the breaks are

How to change order of boxplots when using ggplot2?

家住魔仙堡 提交于 2019-11-26 18:13:17
问题 This question follows from this other one. I was unable to implement answers there. Define: df2 <- data.frame(variable=rep(c("vnu.shr","vph.shr"),each=10), value=seq(1:20)) Plot: require(ggplot2) qplot(variable,value, data=df2,geom="boxplot")+ geom_jitter(position=position_jitter(w=0.1,h=0.1)) I would like to have the boxplots in the reverse order (e.g. one in right on left and so on). I have tried various ways of reordering the factors using levels , ordered , relevel , rev and so on, but I

Why does as.factor return a character when used inside apply?

三世轮回 提交于 2019-11-26 17:40:56
问题 I want to convert variables into factors using apply() : a <- data.frame(x1 = rnorm(100), x2 = sample(c("a","b"), 100, replace = T), x3 = factor(c(rep("a",50) , rep("b",50)))) a2 <- apply(a, 2,as.factor) apply(a2, 2,class) results in: x1 x2 x3 "character" "character" "character" I don't understand why this results in character vectors instead of factor vectors. 回答1: apply converts your data.frame to a character matrix. Use lapply : lapply(a, class) # $x1 # [1] "numeric" # $x2 # [1] "factor" #

Idiom for ifelse-style recoding for multiple categories

孤街醉人 提交于 2019-11-26 17:33:17
I run across this often enough that I figure there has to be a good idiom for it. Suppose I have a data.frame with a bunch of attributes, including "product." I also have a key which translates products to brand + size. Product codes 1-3 are Tylenol, 4-6 are Advil, 7-9 are Bayer, 10-12 are Generic. What's the fastest (in terms of human time) way to code this up? I tend to use nested ifelse 's if there are 3 or fewer categories, and type out the data table and merge it in if there are more than 3. Any better ideas? Stata has a recode command that is pretty nifty for this sort of thing, although

Colouring plot by factor in R

放肆的年华 提交于 2019-11-26 15:46:29
问题 I am making a scatter plot of two variables and would like to colour the points by a factor variable. Here is some reproducible code: data <- iris plot(data$Sepal.Length, data$Sepal.Width, col=data$Species) This is all well and good but how do I know what factor has been coloured what colour?? 回答1: data<-iris plot(data$Sepal.Length, data$Sepal.Width, col=data$Species) legend(7,4.3,unique(data$Species),col=1:length(data$Species),pch=1) should do it for you. But I prefer ggplot2 and would

Coerce multiple columns to factors at once

人走茶凉 提交于 2019-11-26 14:23:53
I have a sample data frame like below: data <- data.frame(matrix(sample(1:40), 4, 10, dimnames = list(1:4, LETTERS[1:10]))) I want to know how can I select multiple columns and convert them together to factors. I usually do it in the way like data$A = as.factor(data$A) . But when the data frame is very large and contains lots of columns, this way will be very time consuming. Does anyone know of a better way to do it? Choose some columns to coerce to factors: cols <- c("A", "C", "D", "H") Use lapply() to coerce and replace the chosen columns: data[cols] <- lapply(data[cols], factor) ## as

Why use as.factor() instead of just factor()

主宰稳场 提交于 2019-11-26 13:01:37
问题 I recently saw Matt Dowle write some code with as.factor() , specifically for (col in names_factors) set(dt, j=col, value=as.factor(dt[[col]])) in a comment to this answer. I used this snippet, but I needed to explicitly set the factor levels to make sure the levels appear in my desired order, so I had to change as.factor(dt[[col]]) to factor(dt[[col]], levels = my_levels) This got me thinking: what (if any) is the benefit to using as.factor() versus just factor() ? 回答1: as.factor is a

Directly creating dummy variable set in a sparse matrix in R

心已入冬 提交于 2019-11-26 11:33:45
问题 Suppose you have a data frame with a high number of columns(1000 factors, each with 15 levels). You\'d like to create a dummy variable data set, but since it would be too sparse, you would like to keep dummies in sparse matrix format. My data set is quite big and the less steps there are, the better for me. I know how to do above steps; but I couldn\'t get my head around directly creating that sparse matrix from the initial data set, i.e. having one step instead of two. Any ideas? EDIT: Some