Trying to return a specified number of characters from a gene sequence in R

旧街凉风 提交于 2019-12-04 01:42:02

问题


I have a DNA sequence like: cgtcgctgtttgtcaaagtcg....

that is possibly 1000+ letters long.

However, I only want to look at letters 5 to 200, for example, and to define this subset of the string as a new object.

I tried looking at the nchar function, but haven't found something that would do this.


回答1:


Try

substr("cgtcgctgtttgtcaa[...]", 5, 200)

See substr().




回答2:


Use the substring function:

> tmp.string <- paste(LETTERS, collapse="")
> tmp.string <- substr(tmp.string, 4, 10)
> tmp.string
[1] "DEFGHIJ"



回答3:


See also the Bioconductor package Biostrings that is a good choice if you need to handle large biological sequences or set of sequences.

#source("http://bioconductor.org/biocLite.R");biocLite("Biostrings") 
library(Biostrings)
s <-paste(rep("gtcgctgtttgtcaac",20),collapse="")
d <- DNAString(s)
d[5:200]
as.character(d[5:200])


来源:https://stackoverflow.com/questions/1489788/trying-to-return-a-specified-number-of-characters-from-a-gene-sequence-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!