问题
At the moment I have a quite long code with a for loop calculating the frequency of the various lengths at different maturities of a dataset, I would like to vectorize the code/find a more elegant solution, however so far I've not been able to work out how to do that. The frequency calculation is a relatively simple one:
(count of occurances of a specific length at a certain maturity/total number of females or males)*100
Example data:
Species Sex Maturity Length
1 HAK M 1 7
2 HAK M 2 24
3 HAK F 2 10
4 HAK M 3 25
5 HAK F 5 25
6 HAK F 4 12
Code that I'm currently using:
reps <- seq(min(Length), max(Length), by = 1)
m1 <- m2 <- m3 <- m4 <- m5 <- rep(NA, length(reps))
f1 <- f2 <- f3 <- f4 <- f5 <- rep(NA, length(reps))
# Makes vectors for each maturity stage for both sexes
# same length as the reps vector filled with NA for the loop:
# Loop:
for (i in 1:length(reps)) # repeats for each value of the x axis
{
m1[i]<- length(Length[Length == reps[i] & Sex == "M" & Maturity == 1])/total.m*100
m2[i]<- length(Length[Length == reps[i] & Sex == "M" & Maturity == 2])/total.m*100
m3[i]<- length(Length[Length == reps[i] & Sex == "M" & Maturity == 3])/total.m*100
m4[i]<- length(Length[Length == reps[i] & Sex == "M" & Maturity == 4])/total.m*100
m5[i]<- length(Length[Length == reps[i] & Sex == "M" & Maturity == 5])/total.m*100
f1[i]<- length(Length[Length == reps[i] & Sex == "F" & Maturity == 1])/total.f*100
f2[i]<- length(Length[Length == reps[i] & Sex == "F" & Maturity == 2])/total.f*100
f3[i]<- length(Length[Length == reps[i] & Sex == "F" & Maturity == 3])/total.f*100
f4[i]<- length(Length[Length == reps[i] & Sex == "F" & Maturity == 4])/total.f*100
f5[i]<- length(Length[Length == reps[i] & Sex == "F" & Maturity == 5])/total.f*100
}
#Stitching together the output of the loop.
males_all<-rbind(m1, m2, m3, m4, m5)
females_all<-rbind(f1, f2, f3, f4, f5)
This is the output I usually get from the loop:
mat X8 X9 X10 X11 X12 X14 X15
1 m1 0.104712 0.104712 0.6282723 1.3612565 1.884817 0.1047120 0.2094241
2 m2 0.000000 0.000000 0.3141361 0.8376963 2.198953 2.4083770 1.3612565
3 m3 0.000000 0.000000 0.0000000 0.0000000 0.104712 0.2094241 0.1047120
4 m4 0.000000 0.000000 0.0000000 0.0000000 0.000000 0.0000000 0.0000000
5 m5 0.000000 0.000000 0.0000000 0.0000000 0.000000 0.0000000 0.2094241
The columns after mat
are the lengths, for the sake of brevity I've not included all of them, they would go up to 30 or so. The females_all
looks the same, just with f1, f2
etc. in the mat
column.
回答1:
Near as I can tell, this is what you want:
library(dplyr)
counts = count(df, Sex, Maturity, Length)
totals = count(df, Sex, name = "total")
counts = counts %>% left_join(totals) %>%
mutate(prop = n / total)
# # Joining, by = "Sex"
# # A tibble: 6 x 6
# Sex Maturity Length n total prop
# <fct> <int> <int> <int> <int> <dbl>
# 1 F 2 10 1 3 0.333
# 2 F 4 12 1 3 0.333
# 3 F 5 25 1 3 0.333
# 4 M 1 7 1 3 0.333
# 5 M 2 24 1 3 0.333
# 6 M 3 25 1 3 0.333
counts %>% select(Sex, Maturity, Length, prop) %>%
tidyr::spread(key = Length, value = prop, fill = 0)
# # A tibble: 6 x 7
# Sex Maturity `7` `10` `12` `24` `25`
# <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 F 2 0 0.333 0 0 0
# 2 F 4 0 0 0.333 0 0
# 3 F 5 0 0 0 0 0.333
# 4 M 1 0.333 0 0 0 0
# 5 M 2 0 0 0 0.333 0
# 6 M 3 0 0 0 0 0.333
Using this data:
df = read.table(text = " Species Sex Maturity Length
1 HAK M 1 7
2 HAK M 2 24
3 HAK F 2 10
4 HAK M 3 25
5 HAK F 5 25
6 HAK F 4 12", header = T)
来源:https://stackoverflow.com/questions/57463204/how-to-vectorize-length-frequency-calculation