问题
I have a huge file (below is a small set of data) like below, I would like to draw a PCA, I could draw PCA using PCA function but it looks a bit messy, because I have 200 columns so I think maybe t-SNE or UMAP works better, but I couldn't draw using them.
I would like to show the relation and clustering between columns (column name) in a plot. In fact, I collected A, B and ...data from different studies and I like to check is there any batch effect between them or not.
It would be appreciated if anyone can help me!
DF:
A B C D
1:540450-541070 0.12495878 0.71580434 0.65399319 1.04879290
1:546500-548198 0.41064192 0.26136554 0.11939805 0.28721360
1:566726-567392 0.00000000 0.06663644 0.45661687 0.24408844
1:569158-570283 0.34433086 0.27614141 0.54063437 0.21675053
1:603298-605500 0.07036734 0.42324126 0.23017472 0.29530045
1:667800-669700 0.20388011 0.11678913 0.00000000 0.12833913
1:713575-713660 7.29171225 12.53078648 2.38515165 3.82500941
1:724497-727160 0.40730086 0.26664585 0.45678834 0.12209005
1:729399-731900 0.74345727 0.49685579 0.72956458 0.32499580
回答1:
Here are some examples using the iris dataset, since your example data is somewhat too small for the dimensionality reductions.
For tSNE:
library(ggplot2)
library(Rtsne)
dat <- iris
tsne <- Rtsne(dat[!duplicated(dat), -5])
df <- data.frame(x = tsne$Y[,1],
y = tsne$Y[,2],
Species = dat[!duplicated(dat), 5])
ggplot(df, aes(x, y, colour = Species)) +
geom_point()
For UMAP:
library(umap)
umap <- umap(dat[!duplicated(dat), -5])
df <- data.frame(x = umap$layout[,1],
y = umap$layout[,2],
Species = dat[!duplicated(dat), 5])
ggplot(df, aes(x, y, colour = Species)) +
geom_point()
EDIT: Suppose we have data where every subject is a column:
dat <- t(mtcars)
The only extra steps would be to transpose the data before feeding it to tSNE/UMAP and then copying the column names in the plotting data:
tsne <- Rtsne(t(dat), perplexity = 5) # got warning perplexity is too large
df <- data.frame(x = tsne$Y[,1],
y = tsne$Y[,2],
car = colnames(dat))
ggplot(df, aes(x, y, colour = car)) +
geom_point()
来源:https://stackoverflow.com/questions/58593213/is-there-any-way-to-draw-umap-or-t-sne-plot-for-data-table