问题
sapply
and replicate
(etc.) run a specified number of times. Putting sapply(1:N, function(n){expr})
will execute expr
N times. Supposing I wanted sapply
to stop after m runs. Is this possible without making an error? break
doesn't work, and a for
or while
loop would be too slow in my context.
Something akin to:
sapply(1:N, function(n){
#some expression
if(identical(n, m)) break
})
except that break doesn't work.
What I'm trying to do:
Creating a function to read in large (binary) data files of a defined structure but unknown lengths. Using replicate
with array(readBin(...), ...)
is the best way I've found to do this, but I'd like it to stop when NA
starts to be returned (i.e. end of file is reached).
回答1:
A partial workaround can be to used in form of global control variable
i<-TRUE
unlist( sapply(1:10, function(x){if(i){ if(x>=4)(i<<-FALSE); 2*x;}}) )
Though it still runs n times, at least it does not perform the operation every time and spares resources. And I can't exactly make out why simplification did not work all the way and I had to use unlist.
回答2:
Aside from the for
vs *apply
battle, if your problem is to use readBin
until the end of file is reached, keep in mind that:
- you can use a (small) overestimate for
n
(the number of elements to read); - you can know the size of the file through
file.info(filename)$size
; then you can estimate by yourself how many elements are contained in the file.
For instance, say that you are reading integer (four bytes) values. Just try:
readBin(con,"int",n=file.info(filename)$size/4+10)
to read all the file in one shot. The +10
is to make a little overestimation.
回答3:
Thought I'd post what I came up with for this. cbb
is the array being built. In the bad example, cbb
is progressively built up as an array using abind
. At each step, cbb
has to be re-evaluated and so each successive step is slower - a growing object. In the good example, cbb
is built up as a list and a new list entry is declared with each step. R does not need to re-evaluate the existing list each time. The array is bound together at the end with do.call(abind, c(cbb, list(along = 4)))
.
One thing I found helpful was to put cat(".")
within the loop. If the dot printing slows down, that might indicate a growing object. With the good example, dots are printed at a more or less constant rate. The good example is about ten times faster than the bad.
BAD:
cbb <- array(NA, c(N1, N2, N3, 0))
repeat{
sptsnew <- readBin(to.read, "integer", 2L, 4L)
if(identical(sptsnew, integer(0))){cat("\nend of file\n"); break}
... #reading array metadata
cbb <- abind(cbb, array(readBin(to.read, "double", N1*N2*N3, 4L), c(N1, N2, N3, 1)), along = 4)
cat(".")
}
GOOD:
i <- 1
cbb <- list()
repeat{
sptsnew <- readBin(to.read, "integer", 2L, 4L)
if(identical(sptsnew, integer(0))){cat("\nend of file\n"); break}
... #reading array metadata
cbb[[i]] <- array(readBin(to.read, "double", N1*N2*N3, 4L), c(N1, N2, N3, 1))
i <- i + 1
cat(".")
}
cbb <- do.call(abind, c(cbb, list(along = 4)))
来源:https://stackoverflow.com/questions/34220552/exit-a-function-within-apply