I'm running various modeling algorithms on a data set. I've had best results by modeling my input variables to my responses one at a time, e.g.:
model <- train(y ~ x1 + x2 + ... + xn, ...)
Once I train my models, I'd like to not re-run them each time, so I've been trying to save them as .rda
files. Here's an example loop for a random forest model (feel free to suggest a better way than a loop!):
# data_resp contains my measured responses, one per column
# data_pred contains my predictors, one per column
for (i in 1:ncol(data_resp)) {
model <- train(data_pred_scale[!is.na(data_resp[, i]), ],
data_resp[!is.na(data_resp[, i]), i],
method = "rf",
tuneGrid = data.frame(.mtry = c(3:6)),
nodesize = 3,
ntrees = 500)
save(model, file = paste("./models/model_rf_", names(data_resp)[i], ".rda", sep = ""))
When I load the model, however, it's going to be called model
.
I haven't found a good way to save the model with it's corresponding name to try and refer back to it later. I found that one can assign an object to a string like so:
assign(paste("./models/model_rf_", names(data_resp)[i], ".rda", sep = ""), train(...))
But I'm still left with how to refer to the object when I save it:
save(???, file = ...)
I don't know how to call the object by it's custom name.
Lastly, even loading has presented a problem. I've tried assign("model_name", load("./model.rda"))
, but the resultant object, called string
ends up just holding a string of the object name, "model".
In looking around, I found THIS question, which seems relevant, but I'm trying to figure out how to apply it to my situation.
I could create a list with the names of each column name in data_resp
(my measured responses) and then use lapply
to use train()
, but I'm still a bit stuck on how to dynamically refer to the new object name to keep the resultant model in.
When you save the model, save another object called 'name' which is a character string of the thing you want to name it as:
> d=data.frame(x=1:10,y=rnorm(10))
> model=lm(y~x,data=d)
> name="m1"
> save(model,name,file="save1.rda")
> d=data.frame(x=1:10,y=rnorm(10))
> model=lm(y~x,data=d)
> name="m2"
> save(model,name,file="save2.rda")
Now each file knows what it wants its resulting object to be called. How do you get that back on load? Load into a new environment, and assign:
> e=new.env()
> load("save1.rda",env=e)
> assign(e$name,e$model)
> summary(m1)
Call:
lm(formula = y ~ x, data = d)
You can now safely rm or re-use the 'e' object. You can of course wrap this in a function:
> blargh=function(f){e=new.env();load(f,env=e);assign(e$name,e$model,.GlobalEnv)}
> blargh("save2.rda")
> m2
Call:
lm(formula = y ~ x, data = d)
Note this is a double bad thing to do - firstly, you should probably store all the models in one file as a list with names. Secondly, this function has side effects, and if you had an object called m2
already it would get stomped on.
Using assign
like this is nearly always a sign (dyswidt?) that you should use a list instead.
B
There is a fair amount of guesswork involved in this answer but I think this could help:
# get a vector with the column names in data_resp
modNames <- colnames( data_resp )
# create empty list
models <- as.list( NULL )
# iterate through your columns and assign the result as list members
for( n in modNames )
{
models[[n]] <- train(data_pred_scale[!is.na(data_resp[, n]), ], ### this may need modification, can't test without data
data_resp[!is.na(data_resp[, n]), n],
method = "rf",
tuneGrid = data.frame(.mtry = c(3:6)),
nodesize = 3,
ntrees = 500)
}
# save the whole bunch
save( models, file = "models.rda" )
You can now retrieve, just with load( "models.rda )
, this one object, the list with all your models, and address them with list notation, either as models[[1]]
or with the column name, eg. models[["first"]]
.
I think the other answers about doing this with a loop are great. I used this as a chance to finally try and understand lapply
better, as many of the StackOverflow questions about how to do this ended up suggesting the use of lists and lapply
instead of loops.
I really like the idea of combining all results of train()
into a list (which @vaettchen did in his loop), and in thinking about how to do this with a list, this is what I came up with. First, I needed my data.frame in list form, one entry per column. Since I don't really work with lists, I hunted around until just trying as.list(df)
, which worked like a charm.
Next, I want to apply my train function to each element of my list of measured response variables, so I defined the function like this:
# predictors are stored in data_pred
# responses are in data_resp (one per column)
# rows in data_pred/data_resp (perhaps obviously) match, one per observation
train_func <- function(y) { train(x = data_pred, y = y,
method = "rf", tuneGrid = data.frame(.mtry = 3:6),
ntrees = 500) }
Now I just need to use lapply
to apply the train()
call on each element of data_resp
. I didn't know how to create an empty, placeholder list, so thanks to @vaettchen for that (I was trying list_name <- list()
without success):
models <- lapply(as.list(data_resp), train_func)
Awesomely, I found that models
has it's elements automatically named to my column names in data_resp
, which is just fantastic. I'm using this in conjunction with the shiny
package, so this will make it incredibly easy for the user to select a response variable from a drop down (which can store the response variable name) and do:
predict(models[["resp_name"]], new_data)
I think this is much better than the loop based approach and everything just happened to fall in place nicely. I realize the question explicitly asked for naming variables programmatically, so apologies if that pushed others to answer in that fashion vs. a "bigger picture" answer. The ease of lapply
suggests I was trying to force a particular solution when a (at least to my eyes) much better one existed.
Bonus: I didn't realize lists could be multi-dimensional, but in trying it, it appears they can be! This is even better, as I'm using numerous algorithms and I can store everything in one big list object.
func_rf <- function(y) { train(x = data_pred, y = y,
method = "rf", tuneGrid = data.frame(.mtry = 3),
ntrees = 100) }
# svmRadial method requires formula syntax to work with factors,
# so the train function has to be a bit different
# add `scale = F` since I had to preProcess the numeric vars ahead of time
# and cbind to the factors. Without it, caret will try to scale the data
# for you, which fails for factors
func_svm <- function(y) { train(y ~ ., cbind(data_pred, y),
method = "svmRadial", tuneGrid = data.frame(.C = 1, .sigma = .2),
scale = F) }
model_list <- list(NULL)
model_list$rf <- lapply(as.list(data_resp), func_rf)
model_list$svm <- lapply(as.list(data_resp), func_svm)
Now I can refer the desired model and response variable with list syntax!
predict(model_list[["svm"]][["response_variable"]], new_data)
Super happy with this and hopefully it makes the code more efficient, faster, and I really love the "meta-object" I end up with vs. a ton of files, one per model/response variable combination, that I have to load in one at a time later on.
A bit of an old question but still without an accepted answer.
As I understand, you need to programmatically rename a variable and save it so that when reloaded it keeps the new name.
Try this:
saveWithName = function(var.name, obj){
# var.name is a string with the name of the variable you want to assign
# obj is any kind of R object (data.frame, list, etc.) you want to rename and save
assign(var.name, obj)
save(list=var.name, file=sprintf("model_%s.RData", var.name))
}
saveWithName("lab1", c(1,2))
saveWithName("lab2", c(3,4))
load("model_lab1.RData")
load("model_lab2.RData")
print(lab1)
#>[1] 1 2
print(lab2)
#[1] 3 4
来源:https://stackoverflow.com/questions/17769121/best-way-to-name-objects-programmatically-using-r