PCA-LDA analysis - R | 易学教程

问题

In this example (https://gist.github.com/thigm85/8424654) LDA was examined vs. PCA on iris dataset. How can I also do LDA on the PCA results (PCA-LDA) ?

Code:

require(MASS)
require(ggplot2)
require(scales)
require(gridExtra)

pca <- prcomp(iris[,-5],
              center = TRUE,
              scale. = TRUE) 

prop.pca = pca$sdev^2/sum(pca$sdev^2)

lda <- lda(Species ~ ., 
           iris, 
           prior = c(1,1,1)/3)

prop.lda = lda$svd^2/sum(lda$svd^2)

plda <- predict(object = lda,
                newdata = iris)

dataset = data.frame(species = iris[,"Species"],
                     pca = pca$x, lda = plda$x)

p1 <- ggplot(dataset) + geom_point(aes(lda.LD1, lda.LD2, colour = species, shape = species), size = 2.5) + 
  labs(x = paste("LD1 (", percent(prop.lda[1]), ")", sep=""),
       y = paste("LD2 (", percent(prop.lda[2]), ")", sep=""))

p2 <- ggplot(dataset) + geom_point(aes(pca.PC1, pca.PC2, colour = species, shape = species), size = 2.5) +
  labs(x = paste("PC1 (", percent(prop.pca[1]), ")", sep=""),
       y = paste("PC2 (", percent(prop.pca[2]), ")", sep=""))

grid.arrange(p1, p2)

回答1:

Usually you do PCA-LDA to reduce the dimensions of your data before performing PCA. Ideally you decide the first k components to keep from the PCA. In your example with iris, we take the first 2 components, otherwise it will look pretty much the same as without PCA.

Try it like this:

pcdata = data.frame(pca$x[,1:2],Species=iris$Species)
pc_lda <- lda(Species ~ .,data=pcdata , prior = c(1,1,1)/3)
prop_pc_lda = pc_lda$svd^2/sum(pc_lda$svd^2)
pc_plda <- predict(object = pc_lda,newdata = pcdata)

dataset = data.frame(species = iris[,"Species"],pc_plda$x)

p3 <- ggplot(dataset) + geom_point(aes(LD1, LD2, colour = species, shape = species), size = 2.5) + 
  labs(x = paste("LD1 (", percent(prop_pc_lda[1]), ")", sep=""),
       y = paste("LD2 (", percent(prop_pc_lda[2]), ")", sep=""))

print(p3)

You don't see much of a difference here because the first 2 components of the PCA captures most of the variance in the iris dataset.

回答2:

This is very simple, apply lda to the principal components coordinates returned by princomp in the question's code.

pca_lda <- lda(pca$x, grouping = iris$Species)

Now it is a matter of using the methods predict for each object type to get the classifications' accuracies.

pred_pca_lda <- predict(lda0, predict(pca, iris))

accuracy_lda <- mean(plda$class == iris$Species)
accuracy_pca_lda <- mean(pred_pca_lda$class == iris$Species)

accuracy_lda
#[1] 0.98
accuracy_pca_lda
#[1] 0.98

来源：https://stackoverflow.com/questions/62573178/pca-lda-analysis-r

标签

pca