How to display plant species biomass in a site by species matrix?

问题

I earlier asked "How to display two columns as binary (presence/absence) matrix?". This question received two excellent answers. I would now like to take this a step further and add a third column to the original site by species columns which reflects the biomass of each species in each plot.

Column 1 (plot) specifies code for ~ 200 plots, column 2 (species) specifies code for ~ 1200 species and Column 3 (biomass) specifies the dryweight. Each plot has > 1 species and each species can occur in > 1 plot. The total number of rows is ~ 2700.

> head(dissim)
    plot species biomass
1 a1f56r  jactom 20.2
2 a1f56r  zinunk 10.3
3 a1f56r  mikcor 0.4
4 a1f56r  rubcle 1.3
5 a1f56r  sphoos 12.4
6 a1f56r nepbis1 8.2

tail(dissim)
           plot species biomass
2707 og100m562r  selcup 4.7
2708 og100m562r  pip139 30.5
2709 og100m562r  stasum 0.1
2710 og100m562r  artani 3.4
2711 og100m562r  annunk 20.7
2712 og100m562r  rubunk 22.6

I would like to create a plot by species matrix that displays the biomass of each species in each plot (rather than a binary presence/absence matrix), something of the form:

    jactom  rubcle  chrodo  uncgla
a1f56r  1.3 0   10.3    0
a1f17r  0   22.3    0   4
a1m5r   3.2 0   3.7 9.7
a1m5r   1   0   0   20.1
a1m17r  5.4 6.9 0   1

Any advice on how to add this additional level of complexity would be very much appreciated.

回答1:

The xtabs and tapply functions return a table which is a matrix:

# Using MrFlick's example
> xtabs(~a+b,dd)
   b
a   f g h i j
  a 0 1 0 2 3
  b 0 0 2 1 0
  c 0 3 0 0 1
  d 2 2 2 1 1
  e 1 1 2 4 1

# --- the tapply solution is a bit less elegant
> dd$one=1
> with(dd, tapply(one, list(a,b), sum))
   f  g  h  i  j
a NA  1 NA  2  3
b NA NA  2  1 NA
c NA  3 NA NA  1
d  2  2  2  1  1
e  1  1  2  4  1

# If you want to make the NA's become zeros then:

> tbl <- with(dd, tapply(one, list(a,b), sum))
> tbl[is.na(tbl)] <- 0
> tbl
  f g h i j
a 0 1 0 2 3
b 0 0 2 1 0
c 0 3 0 0 1
d 2 2 2 1 1
e 1 1 2 4 1

回答2:

With sample data

set.seed(15)
dd<-data.frame(
    a=sample(letters[1:5], 30, replace=T),
    b=sample(letters[6:10], 30, replace=T)
)

if you know each occurrence only appears once you can do

with(dd, table(a,b))

#    b
# a   f g h i j
#   a 0 1 0 2 3
#   b 0 0 2 1 0
#   c 0 3 0 0 1
#   d 2 2 2 1 1
#   e 1 1 2 4 1

if they are potentially duplicated, and you only want to track presence/absence, you can do

with(unique(dd), table(a,b))
# or 
with(dd, (table(a,b)>0)+0)

#    b
# a   f g h i j
#   a 0 1 0 1 1
#   b 0 0 1 1 0
#   c 0 1 0 0 1
#   d 1 1 1 1 1
#   e 1 1 1 1 1

回答3:

You asked also about a solution when there are three variables. Below I provide two solutions that you asked for.

First, let's set up the data the data:

set.seed(15)
dd<-data.frame(
  a=sample(letters[1:5], 30, replace=T),
  b=sample(letters[6:10], 30, replace=T),
  c=sample(letters[1:3], 30, replace=T)
)

If you have three discrete variables and want only to count the occurrences, here you have a version of solution by @MrFlick:

by(dd, dd$c, function(x) with(x, table(a, b)))

And if you want average values of the third variable you can use this solution:

reshape::cast(dd, a ~ b, value = 'c', fun = mean)

来源：https://stackoverflow.com/questions/27327751/how-to-display-plant-species-biomass-in-a-site-by-species-matrix

标签

matrix

multi-dimensional-scaling