Faster way to split a string and count characters using R?

后端 未结 6 789
太阳男子
太阳男子 2021-02-01 08:51

I\'m looking for a faster way to calculate GC content for DNA strings read in from a FASTA file. This boils down to taking a string and counting the number of times that the let

6条回答
  •  轮回少年
    2021-02-01 09:23

    I don't know that it's any faster, but you might want to look at the R package seqinR - http://pbil.univ-lyon1.fr/software/seqinr/home.php?lang=eng. It is an excellent, general bioinformatics package with many methods for sequence analysis. It's in CRAN (which seems to be down as I write this).

    GC content would be:

    mysequence <- s2c("agtctggggggccccttttaagtagatagatagctagtcgta")
        GC(mysequence)  # 0.4761905
    

    That's from a string, you can also read in a fasta file using "read.fasta()".

提交回复
热议问题