问题
I have a dataframe that contains abundance data of many species at two locations:
sp1 sp2 sp3 sp4
SiteA 0 12 0 0
SiteA 0 3 0 0
SiteA 1 0 0 0
SiteB 0 0 6 0
SiteB 2 1 1 0
SiteB 0 1 0 8
I would like to calculate two things:
how many species are found at each site. In this dummy example, there are two species at SiteA and four species at SiteB.
the mean number of taxa in each row for each site. In this case, 1 for SiteA and 2 for SiteB.
回答1:
I like using dplyr and the tidyverse packages for these sorts of summarization questions. More here:
https://dplyr.tidyverse.org/
library(tidyverse)
# First I'd like to reshape into long (aka "tidy") format
df_tidy <- df %>%
mutate(obs_num = row_number()) %>% # To keep track of orig row
gather(sp, count, sp1:sp4)
# First question
df_tidy %>%
# This gives total counts for all recorded combos of site and species
count(site, sp, wt = count) %>%
filter(n > 0) %>%
count(site) # Count how many rows (ie species) for each site
## A tibble: 2 x 2
# site nn
# <chr> <int>
#1 SiteA 2
#2 SiteB 4
# Second question
df_tidy %>%
# Count how many observations had counts > 0 for each site
count(site, obs_num, wt = count > 0) %>%
group_by(site) %>%
summarize(avg_taxa = mean(n))
## A tibble: 2 x 2
# site avg_taxa
# <chr> <dbl>
#1 SiteA 1
#2 SiteB 2
来源:https://stackoverflow.com/questions/53076416/counting-presence-absence-based-on-group