Best way to name objects programmatically using R?

我是研究僧i 提交于 2019-12-01 06:41:28

When you save the model, save another object called 'name' which is a character string of the thing you want to name it as:

> d=data.frame(x=1:10,y=rnorm(10))
> model=lm(y~x,data=d)
> name="m1"
> save(model,name,file="save1.rda")
> d=data.frame(x=1:10,y=rnorm(10))
> model=lm(y~x,data=d)
> name="m2"
> save(model,name,file="save2.rda")

Now each file knows what it wants its resulting object to be called. How do you get that back on load? Load into a new environment, and assign:

> e=new.env()
> load("save1.rda",env=e)
> assign(e$name,e$model)
> summary(m1)

Call:
lm(formula = y ~ x, data = d)

You can now safely rm or re-use the 'e' object. You can of course wrap this in a function:

> blargh=function(f){e=new.env();load(f,env=e);assign(e$name,e$model,.GlobalEnv)}
> blargh("save2.rda")
> m2

Call:
lm(formula = y ~ x, data = d)

Note this is a double bad thing to do - firstly, you should probably store all the models in one file as a list with names. Secondly, this function has side effects, and if you had an object called m2 already it would get stomped on.

Using assign like this is nearly always a sign (dyswidt?) that you should use a list instead.

B

There is a fair amount of guesswork involved in this answer but I think this could help:

# get a vector with the column names in data_resp
modNames <- colnames( data_resp )

# create empty list
models <- as.list( NULL )

# iterate through your columns and assign the result as list members
for( n in modNames )
{
  models[[n]] <- train(data_pred_scale[!is.na(data_resp[, n]), ],  ### this may need modification, can't test without data
                 data_resp[!is.na(data_resp[, n]), n],
                 method = "rf",
                 tuneGrid = data.frame(.mtry = c(3:6)),
                 nodesize = 3,
                 ntrees = 500)
}

# save the whole bunch
save( models, file = "models.rda" )

You can now retrieve, just with load( "models.rda ), this one object, the list with all your models, and address them with list notation, either as models[[1]] or with the column name, eg. models[["first"]].

I think the other answers about doing this with a loop are great. I used this as a chance to finally try and understand lapply better, as many of the StackOverflow questions about how to do this ended up suggesting the use of lists and lapply instead of loops.

I really like the idea of combining all results of train() into a list (which @vaettchen did in his loop), and in thinking about how to do this with a list, this is what I came up with. First, I needed my data.frame in list form, one entry per column. Since I don't really work with lists, I hunted around until just trying as.list(df), which worked like a charm.

Next, I want to apply my train function to each element of my list of measured response variables, so I defined the function like this:

# predictors are stored in data_pred
# responses are in data_resp (one per column)
# rows in data_pred/data_resp (perhaps obviously) match, one per observation

train_func <- function(y) { train(x = data_pred, y = y,
   method = "rf", tuneGrid = data.frame(.mtry = 3:6),
   ntrees = 500) }

Now I just need to use lapply to apply the train() call on each element of data_resp. I didn't know how to create an empty, placeholder list, so thanks to @vaettchen for that (I was trying list_name <- list() without success):

models <- lapply(as.list(data_resp), train_func)

Awesomely, I found that models has it's elements automatically named to my column names in data_resp, which is just fantastic. I'm using this in conjunction with the shiny package, so this will make it incredibly easy for the user to select a response variable from a drop down (which can store the response variable name) and do:

predict(models[["resp_name"]], new_data)

I think this is much better than the loop based approach and everything just happened to fall in place nicely. I realize the question explicitly asked for naming variables programmatically, so apologies if that pushed others to answer in that fashion vs. a "bigger picture" answer. The ease of lapply suggests I was trying to force a particular solution when a (at least to my eyes) much better one existed.


Bonus: I didn't realize lists could be multi-dimensional, but in trying it, it appears they can be! This is even better, as I'm using numerous algorithms and I can store everything in one big list object.

 func_rf <- function(y) { train(x = data_pred, y = y,
     method = "rf", tuneGrid = data.frame(.mtry = 3),
     ntrees = 100) }

 # svmRadial method requires formula syntax to work with factors,
 # so the train function has to be a bit different
 # add `scale = F` since I had to preProcess the numeric vars ahead of time
 # and cbind to the factors. Without it, caret will try to scale the data
 # for you, which fails for factors

 func_svm <- function(y) { train(y ~ ., cbind(data_pred, y),
     method = "svmRadial", tuneGrid = data.frame(.C = 1, .sigma = .2),
     scale = F) }

 model_list <- list(NULL)
 model_list$rf <- lapply(as.list(data_resp), func_rf)
 model_list$svm <- lapply(as.list(data_resp), func_svm)

Now I can refer the desired model and response variable with list syntax!

 predict(model_list[["svm"]][["response_variable"]], new_data)

Super happy with this and hopefully it makes the code more efficient, faster, and I really love the "meta-object" I end up with vs. a ton of files, one per model/response variable combination, that I have to load in one at a time later on.

A bit of an old question but still without an accepted answer.
As I understand, you need to programmatically rename a variable and save it so that when reloaded it keeps the new name.
Try this:

saveWithName = function(var.name, obj){
  # var.name is a string with the name of the variable you want to assign
  # obj is any kind of R object (data.frame, list, etc.) you want to rename and save
  assign(var.name, obj)
  save(list=var.name, file=sprintf("model_%s.RData", var.name))
}

saveWithName("lab1", c(1,2))
saveWithName("lab2", c(3,4))
load("model_lab1.RData")
load("model_lab2.RData")

print(lab1)
#>[1] 1 2
print(lab2)
#[1] 3 4
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!