Read table with comment lines starting with “##”

两盒软妹~` 提交于 2020-06-27 07:51:09

问题


I'm struggling to read my tables in Variant Call Format (VCF) with R. Each file has some comment lines starting with ##, and then the header starting with #.

## contig=<ID=OTU1431,length=253>
## contig=<ID=OTU915,length=253>
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  /home/sega/data/bwa/reads/0015.2142.fastq.q10sorted.bam
Eubacterium_ruminantium_AB008552    56  .   C   T   228 .   DP=212;AD=0,212;VDB=0;SGB=-0.693147;MQ0F=0;AC=2;AN=2;DP4=0,0,0,212;MQ=59    GT:PL   1/1:255,255,0

How can I read such table without missing a header? Using read.table() with comment.char = "##" returns an error: "invalid 'comment.char' argument"


回答1:


If you want to read VCF, you can also just try to use readVcf from VariantAnnotation in Bioconductor. https://bioconductor.org/packages/release/bioc/html/VariantAnnotation.html

Otherwise, I can highly recommend fread function in data.table package. It allows you to use the skip argument to allow it to start importing when a substring has been found.

e.g.

fread("test.vcf", skip = "CHROM")

should work.



来源:https://stackoverflow.com/questions/42370218/read-table-with-comment-lines-starting-with

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!