Subsetting defined group out of DNAStringSet

别说谁变了你拦得住时间么 提交于 2019-12-11 16:33:44

问题


I want to retrieve a couple of sequences from my DNAStringSet. So far I only manage to get a single sequence.

For example: A DNAStringset and the list/pattern of sequences I want to isolate.

Testset:

aDNAStringSet <- DNAStringSet(c("GCATCCATTAC", "AATCGCCATCC", "GCATACCTTAC", "GCATACCTTAC", "GCATACCTTAC"))

Names:

names(aDNAStringSet) <- c("seq1", "seq2", "seq3", "seq4", "seq5") 

The list of sequences to isolate:

patterns <- c("seq2", "seq4", "seq5")   

What I tested so far:

selection <- aDNAStringSet [grep("seq2",names(aDNAStringSet ))] 

or

selection <- aDNAStringSet [grep(patterns,names(aDNAStringSet ))]

grep works, but only for a single sequence.

----------------------sapplyand match doesn't work: -------

Using sapply:

selection <- aDNAStringSet[unlist(sapply(patterns, grep, aDNAStringSet$names)), ]

or using match:

selection <-match(c("seq2", "seq4", "seq5"), aDNAStringSet$names)    

I want a stringset only containing "seq2", "seq4", "seq5", any idea? Thx K


回答1:


You can do

aDNAStringSet[names(aDNAStringSet) %in% patterns]
#  A DNAStringSet instance of length 3
#    width seq                                               names
#[1]    11 AATCGCCATCC                                       seq2
#[2]    11 GCATACCTTAC                                       seq4
#[3]    11 GCATACCTTAC                                       seq5    

Or using match

aDNAStringSet[sapply(patterns, function(x) match(x, names(aDNAStringSet)))]
#  A DNAStringSet instance of length 3
#    width seq                                               names
#[1]    11 AATCGCCATCC                                       seq2
#[2]    11 GCATACCTTAC                                       seq4
#[3]    11 GCATACCTTAC                                       seq5

Or if you prefer grep (for regexp matching)

aDNAStringSet[sapply(patterns, function(x) grep(x, names(aDNAStringSet)))]
#  A DNAStringSet instance of length 3
#    width seq                                               names
#[1]    11 AATCGCCATCC                                       seq2
#[2]    11 GCATACCTTAC                                       seq4
#[3]    11 GCATACCTTAC                                       seq5


来源:https://stackoverflow.com/questions/54306155/subsetting-defined-group-out-of-dnastringset

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!