r-factor

How can i convert a factor column that contains decimal numbers to numeric?

独自空忆成欢 提交于 2019-11-27 18:35:53
问题 I have a csv file and when i use this command SOLK<-read.table('Book1.csv',header=TRUE,sep=';') I get this output > SOLK Time Close Volume 1 10:27:03,6 0,99 1000 2 10:32:58,4 0,98 100 3 10:34:16,9 0,98 600 4 10:35:46,0 0,97 500 5 10:35:50,6 0,96 50 6 10:35:50,6 0,96 1000 7 10:36:10,3 0,95 40 8 10:36:10,3 0,95 100 9 10:36:10,4 0,95 500 10 10:36:10,4 0,95 100 . . . . . . . . . . . . 285 17:09:44,0 0,96 404 the str(SOLK) outcomes this 'data.frame': 285 obs. of 3 variables: $ Time : Factor w/ 174

R: factor levels, recode rest to 'other'

会有一股神秘感。 提交于 2019-11-27 16:18:58
问题 I use factors somewhat infrequently and generally find them comprehensible, but I often am fuzzy about the details for specific operations. Currently, I am coding/collapsing categories with few observations into "other" and am looking for a quick way to do that--I have a perhaps 20 levels of a variable, but am interested in collapsing a bunch of them to one. data <- data.frame(employees = sample.int(1000,500), naics = sample(c('621111','621112','621210','621310','621320','621330','621340',

Concatenate rows of a data frame

旧街凉风 提交于 2019-11-27 13:30:00
I would like to take a data frame with characters and numbers, and concatenate all of the elements of the each row into a single string, which would be stored as a single element in a vector. As an example, I make a data frame of letters and numbers, and then I would like to concatenate the first row via the paste function, and hopefully return the value "A1" df <- data.frame(letters = LETTERS[1:5], numbers = 1:5) df ## letters numbers ## 1 A 1 ## 2 B 2 ## 3 C 3 ## 4 D 4 ## 5 E 5 paste(df[1,], sep =".") ## [1] "1" "1" So paste is converting each element of the row into an integer that

How to change order of boxplots when using ggplot2?

你。 提交于 2019-11-27 12:45:19
This question follows from this other one . I was unable to implement answers there. Define: df2 <- data.frame(variable=rep(c("vnu.shr","vph.shr"),each=10), value=seq(1:20)) Plot: require(ggplot2) qplot(variable,value, data=df2,geom="boxplot")+ geom_jitter(position=position_jitter(w=0.1,h=0.1)) I would like to have the boxplots in the reverse order (e.g. one in right on left and so on). I have tried various ways of reordering the factors using levels , ordered , relevel , rev and so on, but I simply cannot seem to get the syntax right. Have you tried this: df2$variable <- factor(df2$variable,

Colouring plot by factor in R

那年仲夏 提交于 2019-11-27 11:49:13
I am making a scatter plot of two variables and would like to colour the points by a factor variable. Here is some reproducible code: data <- iris plot(data$Sepal.Length, data$Sepal.Width, col=data$Species) This is all well and good but how do I know what factor has been coloured what colour?? data<-iris plot(data$Sepal.Length, data$Sepal.Width, col=data$Species) legend(7,4.3,unique(data$Species),col=1:length(data$Species),pch=1) should do it for you. But I prefer ggplot2 and would suggest that for better graphics in R. The command palette tells you the colours and their order when col =

How to fill NAs with LOCF by factors in data frame, split by country

主宰稳场 提交于 2019-11-27 11:18:40
问题 I have the following data frame (simplified) with the country variable as a factor and the value variable has missing values: country value AUT NA AUT 5 AUT NA AUT NA GER NA GER NA GER 7 GER NA GER NA The following generates the above data frame: data <- data.frame(country=c("AUT", "AUT", "AUT", "AUT", "GER", "GER", "GER", "GER", "GER"), value=c(NA, 5, NA, NA, NA, NA, 7, NA, NA)) Now, I would like to replace the NA values in each country subset using the method last observation carried

Convert factor to integer in a data frame

不羁岁月 提交于 2019-11-27 09:09:53
I have the following code anna.table<-data.frame (anna1,anna2) write.table<-(anna.table, file="anna.file.txt",sep='\t', quote=FALSE) my table in the end contains numbers such as the following chr start end score chr2 41237927 41238801 151 chr1 36976262 36977889 226 chr8 83023623 83025129 185 and so on...... after that i am trying to to get only the values which fit some criteria such as score less than a specific value so i am doing the following anna3<-"data/anna/anna.file.txt" anna.total<-read.table(anna3,header=TRUE) significant.anna<-subset(anna.total,score <=0.001) Error: In Ops.factor

How can I compare two factors with different levels?

大城市里の小女人 提交于 2019-11-27 08:13:12
问题 Is it possible to compare two factors of same length, but different levels? For example, if we have these 2 factor variables: A <- factor(1:5) str(A) Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 B <- factor(c(1:3,6,6)) str(B) Factor w/ 4 levels "1","2","3","6": 1 2 3 4 4 If I try to compare them using, for example, the == operator: mean(A == B) I get the following error: Error in Ops.factor(A, B) : level sets of factors are different 回答1: Convert to character then compare: # data A <-

Why does as.factor return a character when used inside apply?

六眼飞鱼酱① 提交于 2019-11-27 07:29:41
I want to convert variables into factors using apply() : a <- data.frame(x1 = rnorm(100), x2 = sample(c("a","b"), 100, replace = T), x3 = factor(c(rep("a",50) , rep("b",50)))) a2 <- apply(a, 2,as.factor) apply(a2, 2,class) results in: x1 x2 x3 "character" "character" "character" I don't understand why this results in character vectors instead of factor vectors. Marek apply converts your data.frame to a character matrix. Use lapply : lapply(a, class) # $x1 # [1] "numeric" # $x2 # [1] "factor" # $x3 # [1] "factor" In second command apply converts result to character matrix, using lapply : a2 <-

Why use as.factor() instead of just factor()

|▌冷眼眸甩不掉的悲伤 提交于 2019-11-27 07:04:16
I recently saw Matt Dowle write some code with as.factor() , specifically for (col in names_factors) set(dt, j=col, value=as.factor(dt[[col]])) in a comment to this answer . I used this snippet, but I needed to explicitly set the factor levels to make sure the levels appear in my desired order, so I had to change as.factor(dt[[col]]) to factor(dt[[col]], levels = my_levels) This got me thinking: what (if any) is the benefit to using as.factor() versus just factor() ? as.factor is a wrapper for factor , but it allows quick return if the input vector is already a factor: function (x) { if (is