fasta

how to read a fasta file in python?

时光怂恿深爱的人放手 提交于 2021-02-07 07:23:16
问题 I'm trying to read a FASTA file and then find specific motif(string) and print out the sequence and number of times it occurs. A FASTA file is just series of sequences(strings) that starts with a header line and the signature for header or start of a new sequence is ">". in a new line immediately after the header is the sequence of letters.I'm not done with code but so far I have this and it gives me this error: AttributeError: 'str' object has no attribute 'next' I'm not sure what's wrong

Reading an FASTA file

荒凉一梦 提交于 2021-02-05 09:15:47
问题 I want to convert the following line of a file into JSON, I want to save that into an mongoose schema. >HWI-ST700660_96:2:1101:1455:2154#5@0/1 GAA…..GAATG Should be: {“>HWI-ST700660_96:2:1101:1455:2154#5@0/1”: “GAA…..GAATG”} I have tried several options, one sample below, but no success, any suggestion? const parser = require("csv-parse/lib/sync");//import parser const fs = require("fs");//import file reader const path = require("path");//for join paths const sourceData = fs.readFileSync(path

Loading FASTA file in R faster than when using read.fasta() from seqinr

耗尽温柔 提交于 2021-01-29 10:13:47
问题 I am currently using the function read.fasta() from the R package seqinr. I think that creating an index file already make the reading faster but I was wondering if there was already another function to load it faster ? I looked for the function read.big.fasta() from PopGenome, but the package has been removed from CRAN and Bioconductor, so I am not so sure about it anymore. Any advices? 回答1: You can use readDNAStringSet from Biostrings . Get the human genome: download.file("https:/

Error while writing fasta file using biopython

梦想的初衷 提交于 2020-12-11 22:02:07
问题 I used the following code to write the fasta sequence into file. from Bio import SeqIO sequences = "KKPPLLRR" # add code here output_handle = open("example.fasta", "w") SeqIO.write(sequences, output_handle, "fasta") output_handle.close() I got the following error: self = <Bio.SeqIO.FastaIO.FastaWriter object at 0x21c1d10>, record = 'M' def write_record(self, record): """Write a single Fasta record to the file.""" assert self._header_written assert not self._footer_written self._record_written

Error while writing fasta file using biopython

不问归期 提交于 2020-12-11 21:59:12
问题 I used the following code to write the fasta sequence into file. from Bio import SeqIO sequences = "KKPPLLRR" # add code here output_handle = open("example.fasta", "w") SeqIO.write(sequences, output_handle, "fasta") output_handle.close() I got the following error: self = <Bio.SeqIO.FastaIO.FastaWriter object at 0x21c1d10>, record = 'M' def write_record(self, record): """Write a single Fasta record to the file.""" assert self._header_written assert not self._footer_written self._record_written

Error while writing fasta file using biopython

独自空忆成欢 提交于 2020-12-11 21:59:05
问题 I used the following code to write the fasta sequence into file. from Bio import SeqIO sequences = "KKPPLLRR" # add code here output_handle = open("example.fasta", "w") SeqIO.write(sequences, output_handle, "fasta") output_handle.close() I got the following error: self = <Bio.SeqIO.FastaIO.FastaWriter object at 0x21c1d10>, record = 'M' def write_record(self, record): """Write a single Fasta record to the file.""" assert self._header_written assert not self._footer_written self._record_written

AttributeError: 'str' object has no attribute 'id' using BioPython, parsing fasta

时光怂恿深爱的人放手 提交于 2020-06-28 03:21:47
问题 I am trying to use Bio and SeqIO to open a FASTA file that contains multiple sequences, edit the names of the sequences to remove a '.seq' on the end of all the names, (>SeqID20.seq should become >SeqID20), then write all the sequences to a new FASTA file, But i get the following error AttributeError: 'str' object has no attribute 'id' This is what I started with : with open ('lots_of_fasta_in_file.fasta') as f: for seq_record in SeqIO.parse(f, 'fasta'): name, sequence = seq_record.id, str

How can I count the frequency of letters

柔情痞子 提交于 2020-01-22 16:54:21
问题 I have a data like this >sp|Q96A73|P33MX_HUMAN Putative monooxygenase p33MONOX OS=Homo sapiens OX=9606 GN=KIAA1191 PE=1 SV=1 RNDDDDTSVCLGTRQCSWFAGCTNRTWNSSAVPLIGLPNTQDYKWVDRNSGLTWSGNDTCLYSCQNQTKGLLYQLFRNLFCSYGLTEAHGKWRCADASITNDKGHDGHRTPTWWLTGSNLTLSVNNSGLFFLCGNGVYKGFPPKWSGRCGLGYLVPSLTRYLTLNASQITNLRSFIHKVTPHR >sp|P13674|P4HA1_HUMAN Prolyl 4-hydroxylase subunit alpha-1 OS=Homo sapiens OX=9606 GN=P4HA1 PE=1 SV=2

How to find a query that you can find with FASTA but not with BLAST and vice versa?

岁酱吖の 提交于 2020-01-17 03:53:26
问题 I need to find a sequence or sequences that should give results (hits) in Fasta but not in Blast, or vice versa. And I am kinda lost. What should I look for while searching this sequence(s)? 回答1: When you say find a sequence by BLAST or FASTA I assume you mean find a hit in the database? I think FASTA might be better at finding alignments between dissimilar sequences than BLAST but BLAST is better at aligning similar sequences. 来源: https://stackoverflow.com/questions/26491285/how-to-find-a

Remove line breaks in a FASTA file

半城伤御伤魂 提交于 2019-12-28 11:48:14
问题 I have a fasta file where the sequences are broken up with newlines. I'd like to remove the newlines. Here's an example of my file: >accession1 ATGGCCCATG GGATCCTAGC >accession2 GATATCCATG AAACGGCTTA I'd like to convert it into this: >accession1 ATGGCCCATGGGATCCTAGC >accession2 GATATCCATGAAACGGCTTA I found a potential solution on this site, which looks like this: cat input.fasta | awk '{if (substr($0,1,1)==">"){if (p){print "\n";} print $0} else printf("%s",$0);p++;}END{print "\n"}' >