How to get the biggest number in a file?

前端未结
关注
 4  510
盖世英雄少女心 2020-12-04 00:41
I want to get the maximum number in a file, where numbers are integers that can occur in any place of the file.
I thought about doing the following:

      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   北海茫月
                                             
                
                
                (楼主)
            
              
              
                2020-12-04 01:15
              

            
            
                        
In awk you can say:
awk '{for(i=1;i<=NF;i++)if(int($i)){a[$i]=$i}}END{x=asort(a);print a[x]}' file

Explanation
In my experience awk is the fastest text processing language for most tasks and the only thing I have seen of comparable speed (on Linux systems) are programs written in C/C++.
In the code above using minimal functions and commands will allow for faster execution.
for(i=1;i<=NF;i++) - Loops through fields on the line. Using the default FS/RS and looping
                     this way is usually faster than using custom ones as awk is optimised 
                     to use the default

if(int($i))        - Checks if the field is not equal to zero and as strings are set to zero 
                     by int, does not execute the next block if the field is a string. I 
                     believe this is the quickest way to perform this check

{a[$i]=$i}         - Sets an array variable with the number as key and value. This means 
                     there will only be as many array variables as there are numbers in 
                     the file and will hopefully be quicker than a comparison of every 
                     number 

END{x=asort(a)     - At the end of the file, use asort on the array and store the s
                     size of the array in x.

print a[x]         - Print the last element in the array.           


Benchmark
Mine:
time awk '{for(i=1;i<=NF;i++)if(int($i)){a[$i]=$i}}END{x=asort(a);print a[x]}' file

took
real    0m0.434s
user    0m0.357s
sys     0m0.008s


hek2mgl's:
awk '{m=(m<$0 && int($0))?$0:m}END{print m}' RS='[[:space:]*]' file

took
real    0m1.256s
user    0m1.134s
sys     0m0.019s

For those wondering why it is faster it is due to using the default FS and RS which awk is optimised for using
Changing
awk '{m=(m<$0 && int($0))?$0:m}END{print m}' RS='[[:space:]*]'

to
awk '{for(i=1;i<=NF;i++)m=(m<$i && int($i))?$i:m}END{print m}'

provides the time
real    0m0.574s
user    0m0.497s
sys     0m0.011s

Which is still a little slower than my command.
I believe the slight difference that is still present is due to asort() only working on around 6 numbers as they are only saved once in the array.
In comparison, the other command is performing a comparison on every single number in the file which will be more computationally expensive.
I think they would be around the same speed if all the numbers in the file were unique.

Tom Fenech's:
 time awk -v RS="[^-0-9]+" '$0>max{max=$0}END{print max}' myfile

 real    0m0.716s
 user    0m0.612s
 sys     0m0.013s

A drawback of this approach, though, is that if all the numbers are below zero then max will be blank.

Glenn Jackman's:
time awk 'NR==1 || max < 0+$0 {max=0+$0} END {print max}' RS='[[:space:]]+' file

real    0m1.492s
user    0m1.258s
sys     0m0.022s

and
time perl -MList::Util=max -0777 -nE 'say max /-?\d+/g' file

real    0m0.790s
user    0m0.686s
sys     0m0.034s

The good thing about perl -MList::Util=max -0777 -nE 'say max /-?\d+/g' is that it is the only answer on here that will work if 0 appears in the file as the largest number and also works if all numbers are negative.

Notes
All times are representative of the average of 3 tests
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复