“average length of the sequences in a fasta file”: Can you improve this Erlang code?

前端未结

关注

 5  1505

无人共我 2021-02-06 12:29

I\'m trying to get the mean length of fasta sequences using Erlang. A fasta file looks like this

>title1
ATGACTAGCTAGCAGCGATCGACCGTCGTACGC
AT


      
      
        
          5条回答        

        
                    
            
            
                         
                
              
              
                
                   天命终不由人
                                             
                
                
                (楼主)
            
              
              
                2021-02-06 13:08
              

            
            
                        
It looks like your big performance problems have been solved by opening the file in raw mode, but here's some more thoughts if you need to optimise that code further.

Learn and use fprof.

You're using string:strip/1 primarily to remove the trailing newline. As erlang values are immutable you have to make a complete copy of the list (with all the associated memory allocation) just to remove the last character.  If you know the file is well formed, just subtract one from your count, otherwise I'd try writing a length function the counts the number of relevant characters and ignores irrelevant ones.

I'm wary of advice that says binaries are better than lists, but given how little processing you it's probably the case here.  The first steps are to open the file in binary mode and using erlang:size/1 to find the length.

It won't affect performance (significantly), but the multiplication by 1.0 in Total/(1.0*Sequences) is only necessary in languages with broken division.  Erlang division works correctly.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它5个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复