问题
I previously posted a loop question and am trying another loop with no success. Help with trying to figure this out would be greatly appreciated. As of now, to get my work done I'm going to subset the data by year and run my original function as is, but one of the datasets I'm working with is a long time series. My original function calculates the number of fish at age for a given year dataset. This function works fine. What I would like to do is add a for loop that will allow the function to loop for all years and give the same information.
Data:
x <- structure(list(Year = c(2007, 2012, 2012, 2007, 2012, 2007, 2012,
2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007,
2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012,
2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007,
2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012,
2007, 2012, 2012, 2007, 2012, 2007, 2012, 2012, 2007, 2012, 2012,
2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2012, 2007, 2012,
2012, 2012, 2012, 2007, 2012, 2007, 2012, 2007, 2012, 2007, 2012
), Season = c("Fall", "Fall", "Fall", "Fall", "Fall", "Fall",
"Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall",
"Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall",
"Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall",
"Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall",
"Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall",
"Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall",
"Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall",
"Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall",
"Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall", "Fall",
"Fall", "Fall", "Fall", "Fall", "Fall", "Fall"), Length = c(6,
9, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15, 16, 16, 17, 17,
18, 18, 19, 19, 20, 20, 21, 21, 22, 22, 23, 23, 24, 24, 25, 25,
26, 26, 27, 27, 28, 28, 29, 29, 30, 30, 31, 31, 32, 32, 33, 33,
34, 34, 35, 35, 36, 37, 37, 38, 38, 39, 40, 40, 41, 42, 42, 43,
43, 44, 44, 45, 45, 46, 47, 47, 48, 49, 50, 51, 51, 52, 52, 53,
54, 55, 58), Exp_number = c(2, 1, 3, 2, 2, 6, 4, 11, 6, 24, 13,
38, 41.208, 26, 77.096, 37, 227.704, 41, 276.064, 20, 276.536,
23, 277.008, 23, 72.832, 11, 66.096, 8, 43.888, 12, 13.472, 14,
2, 14, 4, 8, 4, 10, 5, 12, 2, 13, 5, 9, 2, 7, 1, 4, 3, 2, 2,
8, 2, 3, 2, 1, 3, 2, 5, 1, 8, 2, 2, 2, 1, 6, 1, 2, 1, 1, 4, 1,
3, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1)), .Names = c("Year", "Season",
"Length", "Exp_number"), row.names = c(8L, 43L, 55L, 64L, 68L,
75L, 78L, 86L, 91L, 98L, 103L, 110L, 115L, 120L, 125L, 131L,
136L, 143L, 148L, 157L, 162L, 169L, 174L, 181L, 186L, 193L, 197L,
206L, 211L, 220L, 225L, 234L, 238L, 247L, 252L, 260L, 265L, 274L,
279L, 288L, 293L, 302L, 307L, 316L, 320L, 329L, 334L, 343L, 346L,
355L, 360L, 368L, 371L, 383L, 392L, 395L, 404L, 409L, 422L, 430L,
435L, 447L, 456L, 461L, 468L, 472L, 480L, 483L, 491L, 495L, 505L,
512L, 516L, 527L, 537L, 545L, 550L, 553L, 558L, 562L, 565L, 568L,
571L, 583L), class = "data.frame")
y <- structure(c(4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49,
50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 4, 5, 30, 29, 28,
17, 8, 8, 6, 16, 26, 59, 46, 77, 89, 78, 64, 51, 34, 31, 27,
19, 30, 21, 26, 13, 26, 18, 12, 8, 9, 12, 9, 6, 13, 12, 20, 10,
14, 14, 11, 8, 10, 13, 7, 6, 4, 4, 8, 2, 2, 4, 3, 0, 2, 0, 1,
1, 1, 1, 1, 1, 0.941176470588235, 0.875, 0.625, 0.666666666666667,
0.375, 0.423076923076923, 0.423728813559322, 0.391304347826087,
0.246753246753247, 0.235955056179775, 0.153846153846154, 0.203125,
0.0980392156862745, 0.0882352941176471, 0, 0.037037037037037,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0588235294117647,
0.125, 0.375, 0.333333333333333, 0.5625, 0.5, 0.457627118644068,
0.478260869565217, 0.545454545454545, 0.561797752808989, 0.564102564102564,
0.59375, 0.647058823529412, 0.411764705882353, 0.483870967741935,
0.222222222222222, 0.157894736842105, 0.0666666666666667, 0.0476190476190476,
0.0384615384615385, 0.0769230769230769, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0625, 0.0769230769230769,
0.11864406779661, 0.130434782608696, 0.207792207792208, 0.191011235955056,
0.269230769230769, 0.171875, 0.176470588235294, 0.5, 0.483870967741935,
0.481481481481481, 0.736842105263158, 0.8, 0.619047619047619,
0.576923076923077, 0.615384615384615, 0.423076923076923, 0.277777777777778,
0.333333333333333, 0.25, 0.111111111111111, 0.166666666666667,
0.111111111111111, 0, 0, 0, 0, 0, 0.142857142857143, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0.0112359550561798, 0.0128205128205128,
0.03125, 0.0784313725490196, 0, 0.032258064516129, 0.185185185185185,
0.105263157894737, 0.1, 0.285714285714286, 0.269230769230769,
0.307692307692308, 0.5, 0.5, 0.583333333333333, 0.625, 0.555555555555556,
0.666666666666667, 0.444444444444444, 0.5, 0.538461538461538,
0.333333333333333, 0.25, 0.1, 0, 0.214285714285714, 0, 0.125,
0.1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.037037037037037,
0, 0.0333333333333333, 0.0476190476190476, 0.115384615384615,
0, 0.0769230769230769, 0.166666666666667, 0.0833333333333333,
0.125, 0.222222222222222, 0.166666666666667, 0.444444444444444,
0.333333333333333, 0.230769230769231, 0.333333333333333, 0.4,
0.9, 0.571428571428571, 0.357142857142857, 0.545454545454545,
0.5, 0.4, 0.230769230769231, 0.142857142857143, 0.333333333333333,
0.25, 0, 0.375, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.037037037037037, 0,
0, 0, 0, 0, 0, 0.0555555555555556, 0, 0, 0.111111111111111, 0,
0, 0.166666666666667, 0.230769230769231, 0.25, 0.25, 0, 0.214285714285714,
0.214285714285714, 0.272727272727273, 0, 0.4, 0.461538461538462,
0.285714285714286, 0, 0.75, 0.5, 0.25, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0833333333333333,
0.1, 0, 0.0714285714285714, 0.142857142857143, 0.0909090909090909,
0.125, 0.1, 0.307692307692308, 0.285714285714286, 0.333333333333333,
0, 0.25, 0.25, 0, 1, 0.5, 0, 0, 0.5, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.0714285714285714, 0.0909090909090909,
0, 0, 0, 0.285714285714286, 0.333333333333333, 0, 0, 0, 0, 0,
0.25, 0.333333333333333, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.25, 0, 0, 0, 0, 0, 0.25,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.125, 0.5, 0, 0.25,
0.666666666666667, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.5, 0,
0, 0, 0, 0.5, 0, 1), .Dim = c(57L, 13L), .Dimnames = list(NULL,
c("len", "nl", "A0", "A1", "A2", "A3", "A4", "A5", "A6",
"A7", "A8", "A9", "A10")))
z <- structure(list(Length = c(4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46,
47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60)), .Names = "Length", row.names = c(NA,
-57L), class = "data.frame")
Original function - subset x for one year to run.
subx <- subset(x,Year==2007)
myfunction <- function (x,y,z) {
#x - expanded length data by year and season
#y - alk as a percent by season
#z - length bins from alk percent data
#merge z and y to make sure both data sets have the same number of length bins
comb<-merge(z,x,by="Length",all=T)
#replace NA by column
comb$Year[is.na(comb$Year)]<-unique(x$Year)
comb$Exp_number[is.na(comb$Exp_number)] <- 0
#catch at age
catchatage<-comb[,4]*y[,3:13]
columntotal<-colSums(catchatage[1:dim(catchatage)[1],1:dim(catchatage)[2]])
#assign and combine age, number of fish at age and year
Number=as.data.frame(columntotal)
Age=rownames(Number);Age<-as.numeric(gsub("A","",Age)) #get rid of the letter A in age
Year=rep(unique(x$Year),length(columntotal))
age<-as.data.frame(cbind(Year,Age,Number=Number[,1]))
#reorder age
age<-age[order(age[,2]),]
return(age)
}
myfunction(subx,y,z)
My function with a loop for unique year values - use entire x dataset.
myfunction_2 <- function (x,y,z) {
#x - expanded length data by year and season
#y - alk as a percent by season
#z - length bins from alk percent data
#loop through years in survey
#get unique year values from combined year season dataset
y_levels<-unique(x$Year)
for (i in length(y_levels)){
#subset the data
subset_data<-x$Year==y_levels[i]
#merge z and y to make sure both data sets have the same number of length bins
comb<-merge(z,subset_data,by="Length",all=T)
#replace NA by column
comb$Year[is.na(comb$Year)]<-unique(x$Year[i])
comb$Exp_number[is.na(comb$Exp_number)] <- 0
#catch at age
catchatage<-comb[,4]*y[,3:13]
columntotal<-colSums(catchatage[1:dim(catchatage)[1],1:dim(catchatage)[2]])
#assign and combine age, number of fish at age and year
Number=as.data.frame(columntotal)
Age=rownames(Number);Age<-as.numeric(gsub("A","",Age)) #get rid of the letter A in age
Year=rep(unique(subset_data$Year[i]),length(columntotal))
age<-as.data.frame(cbind(Year,Age,Number=Number[,1]))
#reorder age
age<-age[order(age[,2]),]
return(age)
}
}
myfunction_2(x,y,z)
The error message I receive is:
Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column
So I think my loop must not be subsetting the data by unique year.
Thank you.
回答1:
There are several thing that should be changed in order for this code to work:
subset_data<-x$Year==y_levels[i]
does not actually define a subset, it should rather besubset_data <- subset(x, Year==y_levels[i])
- in the loop,
for (i in length(y_levels))
should befor (i in 1:length(y_levels))
, otherwise the loop will only apply to year 2012 return(age)
is inside the loop when it should be outside- the results of each iteration are not combined
After correcting point 1, it should get much easier to correct the rest.
Another suggestion: your second function should use the first one, the code will be easier to read.
Finally, another way to loop without a "for" statement would be by used an lapply
(or rather do.call
, see @konvas' comment):
do.call(rbind, lapply(unique(x$Year), function(yy) myfunction(subset(x,Year==yy),y,z) ))
回答2:
Although your code should work (after fixing the errors suggested by @VincentGuillemot), you can make myfunction
a bit more readable, if you want. I think it would be worth to modify your starting data frame, right from the beginning, to include every combination of Length
and Year
(rather than doing it in myfunction
, for one year at a time).
So, since what you are after is all length-year combinations, how about something like
# create a data frame consisting of all length-year combinations
data <- expand.grid(Length = z$Length, Year = unique(x$Year))
data <- merge(data, x, all = TRUE) # merge with x
data <- merge(data, as.data.frame(y), by.x = "Length", by.y = "len") # merge with y
data$Exp_number[is.na(data$Exp_number)] <- 0 # set missing Exp_number values to 0
At this stage your data is in one data frame (instead of 3) and the missing values have been taken care of (except for the column Season
, which you don't seem to care about). I find it easier to perform analysis on d
(rather than x
, y
, z
) and you can focus on the actual computations rather than on merging data frames and replacing NA
s.
Now, your function will look something like
myfunction <- function(d) {
# d is a subset of data for a given year
catchatage <- d$Exp_number * d[grep("^A[0-9]*$", names(d))]
Number <- colSums(catchatage)
Age <- as.numeric(gsub("A", "", names(Number)))
result <- data.frame(Year = d$Year[1],
Age = Age, Number = Number)
result[order(result$Age), ]
}
I prefer using column names and grep
instead of column indices, because using indices can lead to errors that go undetected, if the order/structure of the data changes (on the other hand, the names of the variables rarely change).
It remains to apply myfunction
to subsets of data
by Year
, and combine the results. This can be done in many ways, using lapply
as @VincentGuillemot suggested in his post, or by
(or several other non-base methonds like plyr
, dplyr
, data.table
if you are interested to look into them)
do.call(rbind, by(data, list(data$Year), myfunction))
来源:https://stackoverflow.com/questions/24783860/loop-over-unique-values-r