r-factor

Recode, collapse, and order factor levels using a single function with regex matching

三世轮回 提交于 2020-07-05 13:33:13
问题 I find manipulating factor variables in R unduly complicated. Things I frequently want to do when cleaning factors include: Resorting levels – not just to set a reference category, but also put all levels in a logical (non-alphabetical order) for summary tables. x <- factor(x, levels = new.order) Recode / rename factor levels – to simplify names and/or collapse multiple categories into one group. For one-to-one recoding levels(x) <- new.levels(x) or plyr::revalue , see here or here for

Convert factor to integer while maintaining factor level ordering

喜夏-厌秋 提交于 2020-06-27 10:01:49
问题 I have an R dataframe where one of the columns is a factor whose levels have an implicit ordering. How can I convert the factor levels to specific integers in the following manner: "Strongly disagree" --> 1 "Somewhat disagree" --> 2 "Neutral" --> 3 "Somewhat agree" --> 4 "Strongly agree" --> 5 For example, here is my data frame: agree <- c("Strongly agree", "Somewhat disagree", "Somewhat agree", "Neutral", "Strongly agree", "Strongly disagree", "Neutral") age <- c(41, 35, 29, 42, 31, 22, 58)

How to scale a variable by group

北城以北 提交于 2020-05-12 02:45:46
问题 I would really appreciate your help in this question. I have the following dataset and I would like to create a new variable which would contain the standardized values (z distribution) per level of a given factor variable. x <- data.frame(gender = c("boy","boy","boy","girl","girl","girl"), values=c(1,2,3,6,7,8)) x gender values 1 boy 1 2 boy 2 3 boy 3 4 girl 6 5 girl 7 6 girl 8 My aim is to create one new variable which will contain the z-values calculated separately for each factor level

R How to convert a numeric into factor with predifined labels

前提是你 提交于 2020-01-30 11:22:38
问题 labs = letters[3:7] vec = rep(1:5,2) How do I get a factor whose levels are "c" "d" "e" "f" "g" ? 回答1: You can do something like this: labs = letters[3:7] vec = rep(1:5,2) factorVec <- factor(x=vec, levels=sort(unique(vec)), labels = c( "c", "d", "e", "f", "g")) I have sorted the unique(vec) , so as to make results consistent. unique() will return unique values based on the first occurrence of the element. By specifying the order, the code becomes more robust. Also by specifying the levels

R: Why am I not getting type or class “factor” after converting columns to factor?

心已入冬 提交于 2020-01-23 13:37:28
问题 I have the following setup. df <- data.frame(aa = rnorm(1000), bb = rnorm(1000)) apply(df, 2, typeof) # aa bb #"double" "double" apply(df, 2, class) # aa bb #"numeric" "numeric" Then I try to convert one of the columns to "factor". But as you can see below, I am not getting any "factor" type or classes. Am I doing anything wrong ? df[, 1] <- as.factor(df[, 1]) apply(df, 2, typeof) # aa bb #"character" "character" apply(df, 2, class) # aa bb #"character" "character" 回答1: Sorry I felt my

R: Why am I not getting type or class “factor” after converting columns to factor?

眉间皱痕 提交于 2020-01-23 13:37:07
问题 I have the following setup. df <- data.frame(aa = rnorm(1000), bb = rnorm(1000)) apply(df, 2, typeof) # aa bb #"double" "double" apply(df, 2, class) # aa bb #"numeric" "numeric" Then I try to convert one of the columns to "factor". But as you can see below, I am not getting any "factor" type or classes. Am I doing anything wrong ? df[, 1] <- as.factor(df[, 1]) apply(df, 2, typeof) # aa bb #"character" "character" apply(df, 2, class) # aa bb #"character" "character" 回答1: Sorry I felt my

ggplot2 y-axis order changes after subsetting

谁都会走 提交于 2020-01-03 03:02:08
问题 I have a function that works as expected until I subset it. The function, plotCalendar() is my attempt at a Calendar Heat Map using ggplot2 with facets. The y-axis order is important because it is for the "WeekOfMonth" - when the order is reversed the data viz does not look like a calendar. The code is below, first the calling code, then the function to generate some data - generateData(), then the plot function - plotCalendar() The code works as expected when I used df for the data but when

Logistic Regression on factor: Error in eval(family$initialize) : y values must be 0 <= y <= 1

核能气质少年 提交于 2019-12-26 07:45:06
问题 Not able to fix the below error for the below logistic regression training=(IBM$Serial<625) data=IBM[!training,] dim(data) stock.direction <- data$Direction training_model=glm(stock.direction~data$lag2,data=data,family=binomial) ###Error### ---- Error in eval(family$initialize) : y values must be 0 <= y <= 1 Few rows from the data i am using X Date Open High Low Close Adj.Close Volume Return lag1 lag2 lag3 Direction Serial 1 28-11-2012 190.979996 192.039993 189.270004 191.979996 165.107727

Logistic Regression on factor: Error in eval(family$initialize) : y values must be 0 <= y <= 1

痴心易碎 提交于 2019-12-26 07:44:27
问题 Not able to fix the below error for the below logistic regression training=(IBM$Serial<625) data=IBM[!training,] dim(data) stock.direction <- data$Direction training_model=glm(stock.direction~data$lag2,data=data,family=binomial) ###Error### ---- Error in eval(family$initialize) : y values must be 0 <= y <= 1 Few rows from the data i am using X Date Open High Low Close Adj.Close Volume Return lag1 lag2 lag3 Direction Serial 1 28-11-2012 190.979996 192.039993 189.270004 191.979996 165.107727

Logistic Regression on factor: Error in eval(family$initialize) : y values must be 0 <= y <= 1

主宰稳场 提交于 2019-12-26 07:43:32
问题 Not able to fix the below error for the below logistic regression training=(IBM$Serial<625) data=IBM[!training,] dim(data) stock.direction <- data$Direction training_model=glm(stock.direction~data$lag2,data=data,family=binomial) ###Error### ---- Error in eval(family$initialize) : y values must be 0 <= y <= 1 Few rows from the data i am using X Date Open High Low Close Adj.Close Volume Return lag1 lag2 lag3 Direction Serial 1 28-11-2012 190.979996 192.039993 189.270004 191.979996 165.107727