How are envfit results created?

问题

I have a question regarding how to recreate the results from the envfit() function in the vegan package.

Here is an example of envfit() being used with an ordination and an environmental vector.

data(varespec)
data(varechem)
ord <- metaMDS(varespec)
chem.envfit <- envfit(ord, varechem, choices = c(1,2), permutations = 999)
chem.scores.envfit <- as.data.frame(scores(chem.envfit, display = "vectors"))
chem.scores.envfit

"The values that you see in the table are the standardised coefficients from the linear regression used to project the vectors into the ordination. These are directions for arrows of unit length." - comment from Plotted envfit vectors not matching NMDS scores

Also, from ?envfit:

The printed output of continuous variables (vectors) gives the direction cosines which are the coordinates of the heads of unit length vectors. In plot these are scaled by their correlation (square root of the column r2) so that weak predictors have shorter arrows than strong predictors. You can see the scaled relative lengths using command scores.

Could someone please show me explicitly what linear model is being run, what standardized coefficients are being used, and where cosine is being applied to create these values?

回答1:

I probably shouldn't have said "standardised" in that answer.

For each column (variable) in varechem and the first two axes of the ordination (choices = 1:2), the linear model is:

\hat(env_j) = \beta_1 * scr1 + \beta_2 * scr2

where env_j is the $j$th variable in varechem, scr1 and scr2 are the axis scores on the first and second axis being considered (i.e. the plane defined by choices = 1:2, but this extends to higher dimensions), and the \beta are the regression coefficients for the pair of axis scores.

There's no intercept in this model as we (weighted) centre all the variables in varechem and the axis scores, with weights really only concerning CCA, capscale(), and DCA methods as those are weighted models themselves.

The heads of the arrows in the space spanned by the axis scores are the coefficients of that model — we actually normalise (which I misrepresented as "standardised" in that other reply) so that the arrows have unit length. These values (the NMDS1 and NMDS2 columns in the envfit output) are direction cosines in the sense of https://en.wikipedia.org/wiki/Direction_cosine.

Here's a simplified walk through of what we do where there are no weights involved and all the variables in env are numeric, as in your example. (Note we don't actually do it this way for efficiency reasons: see the code behind vectorfit() for the QR decomposition used if you really want the details.)

## extract the axis scores for the axes we want, 1 and 2
scrs <- scores(ord, choices = c(1,2))

## centre the scores (note not standardising them)
scrs <- as.data.frame(scale(scrs, scale = FALSE, center = TRUE))

## centre the environmental variables - keep as matrix
env <- scale(varechem, scale = FALSE, center = TRUE)

## fit the linear models with no intercept
mod <- lm(env ~ NMDS1 + NMDS2 - 1, data = scrs)

## extract the coefficients from the models
betas <- coef(mod)

## normalize coefs to unit length
##   i.e. betas for a  particular env var have sum of squares = 1
t(sweep(betas, 2L, sqrt(colSums(betas^2)), "/"))

The last line gives:

> t(sweep(betas, 2L, sqrt(colSums(betas^2)), "/"))
               NMDS1      NMDS2
N        -0.05731557 -0.9983561
P         0.61972792  0.7848167
K         0.76646744  0.6422832
Ca        0.68520442  0.7283508
Mg        0.63252973  0.7745361
S         0.19139498  0.9815131
Al       -0.87159427  0.4902279
Fe       -0.93600826  0.3519780
Mn        0.79870870 -0.6017179
Zn        0.61755690  0.7865262
Mo       -0.90308490  0.4294621
Baresoil  0.92487118 -0.3802806
Humdepth  0.93282052 -0.3603413
pH       -0.64797447  0.7616621

which replicates (except for showing more signif figures) the values returned by envfit() in this case:

> chem.envfit

***VECTORS

            NMDS1    NMDS2     r2 Pr(>r)    
N        -0.05732 -0.99836 0.2536  0.045 *  
P         0.61973  0.78482 0.1938  0.099 .  
K         0.76647  0.64228 0.1809  0.095 .  
Ca        0.68520  0.72835 0.4119  0.006 ** 
Mg        0.63253  0.77454 0.4270  0.003 ** 
S         0.19139  0.98151 0.1752  0.109    
Al       -0.87159  0.49023 0.5269  0.002 ** 
Fe       -0.93601  0.35198 0.4450  0.002 ** 
Mn        0.79871 -0.60172 0.5231  0.002 ** 
Zn        0.61756  0.78653 0.1879  0.100 .  
Mo       -0.90308  0.42946 0.0609  0.545    
Baresoil  0.92487 -0.38028 0.2508  0.061 .  
Humdepth  0.93282 -0.36034 0.5201  0.001 ***
pH       -0.64797  0.76166 0.2308  0.067 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Permutation: free
Number of permutations: 999

来源：https://stackoverflow.com/questions/60953996/how-are-envfit-results-created

标签

regression

vegan