How to get the biggest number in a file?

前端 未结 4 510
盖世英雄少女心
盖世英雄少女心 2020-12-04 00:41

I want to get the maximum number in a file, where numbers are integers that can occur in any place of the file.

I thought about doing the following:

         


        
4条回答
  •  北海茫月
    2020-12-04 01:15

    In awk you can say:

    awk '{for(i=1;i<=NF;i++)if(int($i)){a[$i]=$i}}END{x=asort(a);print a[x]}' file
    

    Explanation

    In my experience awk is the fastest text processing language for most tasks and the only thing I have seen of comparable speed (on Linux systems) are programs written in C/C++.

    In the code above using minimal functions and commands will allow for faster execution.

    for(i=1;i<=NF;i++) - Loops through fields on the line. Using the default FS/RS and looping
                         this way is usually faster than using custom ones as awk is optimised 
                         to use the default
    
    if(int($i))        - Checks if the field is not equal to zero and as strings are set to zero 
                         by int, does not execute the next block if the field is a string. I 
                         believe this is the quickest way to perform this check
    
    {a[$i]=$i}         - Sets an array variable with the number as key and value. This means 
                         there will only be as many array variables as there are numbers in 
                         the file and will hopefully be quicker than a comparison of every 
                         number 
    
    END{x=asort(a)     - At the end of the file, use asort on the array and store the s
                         size of the array in x.
    
    print a[x]         - Print the last element in the array.           
    

    Benchmark

    Mine:

    time awk '{for(i=1;i<=NF;i++)if(int($i)){a[$i]=$i}}END{x=asort(a);print a[x]}' file
    

    took

    real    0m0.434s
    user    0m0.357s
    sys     0m0.008s
    

    hek2mgl's:

    awk '{m=(m<$0 && int($0))?$0:m}END{print m}' RS='[[:space:]*]' file
    

    took

    real    0m1.256s
    user    0m1.134s
    sys     0m0.019s
    

    For those wondering why it is faster it is due to using the default FS and RS which awk is optimised for using

    Changing

    awk '{m=(m<$0 && int($0))?$0:m}END{print m}' RS='[[:space:]*]'
    

    to

    awk '{for(i=1;i<=NF;i++)m=(m<$i && int($i))?$i:m}END{print m}'
    

    provides the time

    real    0m0.574s
    user    0m0.497s
    sys     0m0.011s
    

    Which is still a little slower than my command.

    I believe the slight difference that is still present is due to asort() only working on around 6 numbers as they are only saved once in the array.

    In comparison, the other command is performing a comparison on every single number in the file which will be more computationally expensive.

    I think they would be around the same speed if all the numbers in the file were unique.


    Tom Fenech's:

     time awk -v RS="[^-0-9]+" '$0>max{max=$0}END{print max}' myfile
    
     real    0m0.716s
     user    0m0.612s
     sys     0m0.013s
    

    A drawback of this approach, though, is that if all the numbers are below zero then max will be blank.


    Glenn Jackman's:

    time awk 'NR==1 || max < 0+$0 {max=0+$0} END {print max}' RS='[[:space:]]+' file
    
    real    0m1.492s
    user    0m1.258s
    sys     0m0.022s
    

    and

    time perl -MList::Util=max -0777 -nE 'say max /-?\d+/g' file
    
    real    0m0.790s
    user    0m0.686s
    sys     0m0.034s
    

    The good thing about perl -MList::Util=max -0777 -nE 'say max /-?\d+/g' is that it is the only answer on here that will work if 0 appears in the file as the largest number and also works if all numbers are negative.


    Notes

    All times are representative of the average of 3 tests

提交回复
热议问题