问题
I want to retrieve a couple of sequences from my DNAStringSet
. So far I only manage to get a single sequence.
For example: A DNAStringset and the list/pattern of sequences I want to isolate.
Testset:
aDNAStringSet <- DNAStringSet(c("GCATCCATTAC", "AATCGCCATCC", "GCATACCTTAC", "GCATACCTTAC", "GCATACCTTAC"))
Names:
names(aDNAStringSet) <- c("seq1", "seq2", "seq3", "seq4", "seq5")
The list of sequences to isolate:
patterns <- c("seq2", "seq4", "seq5")
What I tested so far:
selection <- aDNAStringSet [grep("seq2",names(aDNAStringSet ))]
or
selection <- aDNAStringSet [grep(patterns,names(aDNAStringSet ))]
grep
works, but only for a single sequence.
----------------------sapply
and match
doesn't work: -------
Using sapply
:
selection <- aDNAStringSet[unlist(sapply(patterns, grep, aDNAStringSet$names)), ]
or using match
:
selection <-match(c("seq2", "seq4", "seq5"), aDNAStringSet$names)
I want a stringset only containing "seq2", "seq4", "seq5", any idea? Thx K
回答1:
You can do
aDNAStringSet[names(aDNAStringSet) %in% patterns]
# A DNAStringSet instance of length 3
# width seq names
#[1] 11 AATCGCCATCC seq2
#[2] 11 GCATACCTTAC seq4
#[3] 11 GCATACCTTAC seq5
Or using match
aDNAStringSet[sapply(patterns, function(x) match(x, names(aDNAStringSet)))]
# A DNAStringSet instance of length 3
# width seq names
#[1] 11 AATCGCCATCC seq2
#[2] 11 GCATACCTTAC seq4
#[3] 11 GCATACCTTAC seq5
Or if you prefer grep
(for regexp matching)
aDNAStringSet[sapply(patterns, function(x) grep(x, names(aDNAStringSet)))]
# A DNAStringSet instance of length 3
# width seq names
#[1] 11 AATCGCCATCC seq2
#[2] 11 GCATACCTTAC seq4
#[3] 11 GCATACCTTAC seq5
来源:https://stackoverflow.com/questions/54306155/subsetting-defined-group-out-of-dnastringset