I want to get the maximum number in a file, where numbers are integers that can occur in any place of the file.
I thought about doing the following:
I suspect this will be fastest:
$ tr ' ' '\n' < file | sort -rn | head -1
42342234
Third run:
$ time tr ' ' '\n' < file | sort -rn | head -1
42342234
real 0m0.078s
user 0m0.000s
sys 0m0.076s
btw DON'T WRITE SHELL LOOPS to manipulate text, even if it's creating sample input files:
$ time awk -v s="$(cat a)" 'BEGIN{for (i=1;i<=50000;i++) print s}' > myfile
real 0m0.109s
user 0m0.031s
sys 0m0.061s
$ wc -l myfile
150000 myfile
compared to the shell loop suggested in the question:
$ time for i in {1..50000}; do cat a >> myfile2 ; done
real 26m38.771s
user 1m44.765s
sys 17m9.837s
$ wc -l myfile2
150000 myfile2
If we want something that more robustly handles input files that contain digits in strings that are not integers, we need something like this:
$ cat b
hello 123 how are you i am fine 42342234 and blab bla bla
and 3624 is another number
but this is not enough for -23 234245
73 starts a line
avoid these: 3.14 or 4-5 or $15 or 2:30 or 05/12/2015
$ grep -o -E '(^| )[-]?[0-9]+( |$)' b | sort -rn
42342234
3624
123
73
-23
$ time awk -v s="$(cat b)" 'BEGIN{for (i=1;i<=50000;i++) print s}' > myfileB
real 0m0.109s
user 0m0.000s
sys 0m0.076s
$ wc -l myfileB
250000 myfileB
$ time grep -o -E '(^| )-?[0-9]+( |$)' myfileB | sort -rn | head -1 | tr -d ' '
42342234
real 0m2.480s
user 0m2.509s
sys 0m0.108s
Note that the input file has more lines than the original and with this input the above robust grep solution is actually faster than the original I posted at the start of this question:
$ time tr ' ' '\n' < myfileB | sort -rn | head -1
42342234
real 0m4.836s
user 0m4.445s
sys 0m0.277s