How to get the biggest number in a file?

前端未结
关注
 4  511
盖世英雄少女心 2020-12-04 00:41
I want to get the maximum number in a file, where numbers are integers that can occur in any place of the file.
I thought about doing the following:

      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   北海茫月
                                             
                
                
                (楼主)
            
              
              
                2020-12-04 01:22
              

            
            
                        
I suspect this will be fastest:

$ tr ' ' '\n' < file | sort -rn | head -1
42342234


Third run:

$ time tr ' ' '\n' < file | sort -rn | head -1
42342234
real    0m0.078s
user    0m0.000s
sys     0m0.076s


btw DON'T WRITE SHELL LOOPS to manipulate text, even if it's creating sample input files:

$ time awk -v s="$(cat a)" 'BEGIN{for (i=1;i<=50000;i++) print s}' > myfile

real    0m0.109s
user    0m0.031s
sys     0m0.061s

$ wc -l myfile
150000 myfile


compared to the shell loop suggested in the question:

$ time for i in {1..50000}; do cat a >> myfile2 ; done

real    26m38.771s
user    1m44.765s
sys     17m9.837s

$ wc -l myfile2
150000 myfile2




If we want something that more robustly handles input files that contain digits in strings that are not integers, we need something like this:

$ cat b
hello 123 how are you i am fine 42342234 and blab bla bla
and 3624 is another number
but this is not enough for -23 234245
73 starts a line
avoid these: 3.14 or 4-5 or $15 or 2:30 or 05/12/2015

$ grep -o -E '(^| )[-]?[0-9]+( |$)' b | sort -rn
 42342234
 3624
 123
73
 -23

$ time awk -v s="$(cat b)" 'BEGIN{for (i=1;i<=50000;i++) print s}' > myfileB
real    0m0.109s
user    0m0.000s
sys     0m0.076s

$ wc -l myfileB
250000 myfileB

$ time grep -o -E '(^| )-?[0-9]+( |$)' myfileB | sort -rn | head -1 | tr -d ' '
42342234
real    0m2.480s
user    0m2.509s
sys     0m0.108s


Note that the input file has more lines than the original and with this input the above robust grep solution is actually faster than the original I posted at the start of this question:

$ time tr ' ' '\n' < myfileB | sort -rn | head -1
42342234
real    0m4.836s
user    0m4.445s
sys     0m0.277s

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复