Creating formula using very long strings in R

喜你入骨 提交于 2019-12-19 21:52:09

问题


I'm in a situation where I have a vector full of column names for a really large data frame.

Let's assume: x = c("Name", "address", "Gender", ......, "class" ) [approximatively 100 variables]

Now, I would like to create a formula which I'll eventually use to create a HoeffdingTree. I'm creating formula using:

myformula <- as.formula(paste("class ~ ", paste(x, collapse= "+")))

This throws up the following error:

Error in parse(text = x) : :1:360: unexpected 'else' 1:e+spread+prayforsonni+just+want+amp+argue+blxcknicotine+mood+now+right+actually+herapatra+must+simply+suck+there+always+cookies+ever+everything+getting+nice+nigga+they+times+abu+all+alliepickl

The paste part in the above statement works fine but passing it as an argument to as.formula is throwing all kinds of weird problems.


回答1:


The problem is that you have R keywords as column names. else is a keyword so you can't use it as a regular name.

A simplified example:

s <- c("x", "else", "z")
f <- paste("y~", paste(s, collapse="+"))
formula(f)
# Error in parse(text = x) : <text>:1:10: unexpected '+'
# 1: y~ x+else+
#              ^

The solution is to wrap your words in backticks "`" so that R will treat them as non-syntactic variable names.

f <- paste("y~", paste(sprintf("`%s`", s), collapse="+"))
formula(f)
# y ~ x + `else` + z



回答2:


You can reduce your data-set first

dat_small <- dat[,c("class",x)]

and then use

myformula <- as.formula("class ~ .")

The . means using all other (all but class) column.




回答3:


You may try reformulate

 reformulate(setdiff(x, 'class'), response='class')
 #class ~ Name + address + Gender

where 'x' is

  x <- c("Name", "address", "Gender", 'class')

If R keywords are in the 'x', you can do

   reformulate('.', response='class')
   #class ~ .


来源:https://stackoverflow.com/questions/29555473/creating-formula-using-very-long-strings-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!