Bash while read loop extremely slow compared to cat, why?

后端未结

关注

 4  1132

A simple test script here:

while read LINE; do
        LINECOUNT=$(($LINECOUNT+1))
        if [[ $(($LINECOUNT % 1000)) -eq 0 ]]; then echo $LINECOUNT; fi
do


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  清歌不尽        
                
              
                            
                2020-12-01 19:34
              
            
            
                                                                       
The reason while read is so slow is that the shell is required to make a system call for every byte.  It cannot read a large buffer from the pipe, because the shell must not read more than one line from the input stream and therefore must compare each character against a newline.  If you run strace on a while read loop, you can see this behavior.  This behavior is desirable, because it makes it possible to reliably do things like:

while read size; do dd bs=$size count=1 of=file$(( i++ )); done


in which the commands inside the loop are reading from the same stream that the shell reads from.  If the shell consumed a big chunk of data by reading large buffers, the inner commands would not have access to that data.  An unfortunate side-effect is that read is absurdly slow.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  难免孤独        
                
              
                            
                2020-12-01 19:35
              
            
            
                                                                       
Not really sure what your script is supposed to do. So this might not be an answer to your question but more of a generic tip.

Don't cat your file and pipe it to your script, instead when reading from a file with a bash script do it like this:

while read line    
do    
    echo $line
done <file.txt

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  粉色の甜心        
                
              
                            
                2020-12-01 19:48
              
            
            
                                                                       
It's because the bash script is interpreted and not really optimised for speed in this case. You're usually better off using one of the external tools such as:

awk 'NR%1000==0{print}' inputFile


which matches your "print every 1000 lines" sample.

If you wanted to (for each line) output the line count in characters followed by the line itself, and pipe it through another process, you could also do that:

awk '{print length($0)" "$0}' inputFile | someOtherProcess


Tools like awk, sed, grep, cut and the more powerful perl are far more suited to these tasks than an interpreted shell script.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  我寻月下人不归        
                
              
                            
                2020-12-01 19:48
              
            
            
                                                                       
The perl solution for count bytes of each string:

perl -p -e '
use Encode;
print length(Encode::encode_utf8($_))."\n";$_=""' 


for example:

dd if=/dev/urandom bs=1M count=100 |
   perl -p -e 'use Encode;print length(Encode::encode_utf8($_))."\n";$_=""' |
   tail


works for me as 7.7Mb/s

to compare how much script used:

dd if=/dev/urandom bs=1M count=100 >/dev/null


run as 9.1Mb/s

seems script not so slow :)
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复