counting presence/absence based on group

强颜欢笑 提交于 2021-02-10 14:48:32

问题


I have a dataframe that contains abundance data of many species at two locations:

        sp1 sp2 sp3 sp4
SiteA   0   12  0   0
SiteA   0   3   0   0
SiteA   1   0   0   0
SiteB   0   0   6   0
SiteB   2   1   1   0
SiteB   0   1   0   8

I would like to calculate two things:

  1. how many species are found at each site. In this dummy example, there are two species at SiteA and four species at SiteB.

  2. the mean number of taxa in each row for each site. In this case, 1 for SiteA and 2 for SiteB.


回答1:


I like using dplyr and the tidyverse packages for these sorts of summarization questions. More here: https://dplyr.tidyverse.org/

library(tidyverse)
# First I'd like to reshape into long (aka "tidy") format
df_tidy <- df %>%
  mutate(obs_num = row_number()) %>%  # To keep track of orig row
  gather(sp, count, sp1:sp4)

# First question
df_tidy %>%
  # This gives total counts for all recorded combos of site and species
  count(site, sp, wt = count) %>%
  filter(n > 0) %>%
  count(site)        # Count how many rows (ie species) for each site
## A tibble: 2 x 2
#  site     nn
#  <chr> <int>
#1 SiteA     2
#2 SiteB     4


# Second question
df_tidy %>%
  # Count how many observations had counts > 0 for each site
  count(site, obs_num, wt = count > 0) %>%
  group_by(site) %>%
  summarize(avg_taxa = mean(n))

## A tibble: 2 x 2
#  site  avg_taxa
#  <chr>    <dbl>
#1 SiteA        1
#2 SiteB        2


来源:https://stackoverflow.com/questions/53076416/counting-presence-absence-based-on-group

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!