Loading FASTA file in R faster than when using read.fasta() from seqinr

耗尽温柔 提交于 2021-01-29 10:13:47

问题


I am currently using the function read.fasta() from the R package seqinr.
I think that creating an index file already make the reading faster but I was wondering if there was already another function to load it faster ?

I looked for the function read.big.fasta() from PopGenome, but the package has been removed from CRAN and Bioconductor, so I am not so sure about it anymore. Any advices?


回答1:


You can use readDNAStringSet from Biostrings.

Get the human genome: download.file("https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz","../Downloads/test.fa.gz")

Using readDNAStringSet or read.fasta :

f1 = function(){readDNAStringSet("../Downloads/test.fa.gz")}
f2 = function(){read.fasta("../Downloads/test.fa.gz")}

library(Biostrings)
library(seqinr)

microbenchmark::microbenchmark(f1(),times=5)
Unit: seconds
 expr      min       lq     mean   median       uq      max neval
 f1() 42.82203 43.57036 45.10369 45.64206 46.37412 47.10987     5

microbenchmark::microbenchmark(f1(),times=5)
### did not finish running
### so definitely not the option for large fasta files


来源:https://stackoverflow.com/questions/59792855/loading-fasta-file-in-r-faster-than-when-using-read-fasta-from-seqinr

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!