How can I retain numbers for sorting them later?

问题

I have a problem that sounds like this: Write a shell script that for each file from the command line will output the number of words that are longer than the number k read from keyboard. The output must be ordered by the number of words.

How can i retain the number of characters of each file,for sorting them?

I tried something like that :

#!/bin/bash
if [ #@ -ne 1 ]
        then exit 1
fi
array[$@]=''
echo -n "Give the number>"
read k
for f in $@;
do
        n=`$f | wc -c`
        if [ $n -gt $k ];
        then
                i++
                array[i]=$n
        fi
done
echo {array[@]} | sort -n

回答1:

The challenge is:

Write a shell script that for each file from the command line will output the number of words that are longer than the number k read from keyboard. The output must be ordered by the number of words.

I decline to answer prompts — commands take arguments. I'll go with William Pursell's suggestion that the number is the first argument — it is a reasonable solution. An alternative uses an option like -l 23 for the length (and other options to tweak other actions).

The solutions I see so far are counting the number of words, but not the number of words longer than the given length. This is a problem. For that, I think awk is appropriate:

awk -v min=$k '{ for (i = 1; i <= NF; i++) if (length($i) >= min) print $i; }'

This generates the words at least min characters one per line on the standard output. We'll do this one file at a time, at least in the first pass.

We can then count the number of such words with wc -l. Finally, we can sort the data numerically.

Putting it all together yields:

#!/bin/bash

case "$#" in
0|1) echo "Usage: $0 length file ..." >&2; exit 1;;
esac

k=${1:?"Cannot provide an empty length"}
shift

for file in "$@"
do
    echo "$(awk -v min=$k '{ for (i = 1; i <= NF; i++)
                                 if (length($i) >= min) print $i
                           }' "$file" |
            wc -l) $file"
done | sort -n

This lists the files with the most long words last; that's convenient because the most interesting files are at the end of the list. If you want the high numbers first, add -r to the sort.

Of course, if we're using awk, we can improve things. It can count the number of long words in each file, and print the file name and the number, so there'd be just a single invocation of awk for all the files. It takes a little bit more programming, though:

#!/bin/sh

case "$#" in
0|1) echo "Usage: $0 length file ..." >&2; exit 1;;
esac

k=${1:?"Cannot provide an empty length"}
shift

awk -v min=$k '
    FILENAME != oldfile { if (oldfile != "") { print longwords, oldfile }
                          oldfile = FILENAME; longwords = 0
                        }
    { for (i = 1; i <= NF; i++) if (length($i) >= min) longwords++ }
    END { if (oldfile != "") { print longwords, oldfile } }
    ' "$@" |
sort -n

If you have GNU awk, there are even ways to sort the results built into awk.

回答2:

You can simplify the script a bit:

#!/bin/bash
(( $# > 0 )) || exit
read -r -p 'Enter number > ' k
wc -w "$@" | sed '$d' | gawk -v k="$k" '$1>k{print $0}' | sort -nr

where

read -r -p ... prompts and read the input
wc -w - counts the words of all files what you entered as arguments
sed ... - skips the last line (total...)
awk skips lines where count is less than $k
sort - for sorting the output

With the great help of @Tom Fench here it can be simplified to:

wc -w "$@" | awk -v k="$k" 'NR>1&&p>k{print p}{p=$1}' | sort -nr

or with filenames (based on @Wintermute's comment here)

wc -w "$@" | awk -v k="$k" 'p { print p; p="" } $1 > k { p = $0 }' | sort -nr

EDIT

Based on @Jonathan Leffler's comment adding a variant for for counting words what are longer as number k in each file.

#!/bin/bash
(( $# > 0 )) || exit
read -r -p 'Enter number > ' k
let k++
grep -HoP "\b\w{${k:-3},}\b" "$@" |\
 awk -F: '{f[$1]++}END{for(n in f)print f[n],n}' |\
 sort -nr

Where:

the grep... searches for the words what are longer as the entered number (omit the let line if want equal and longer). prints out lines like:

file1:word1
file1:word2
...
file2:wordx
file2:wordy

and the awk count the frequency based on the 1st field, e.g. count by filename.

来源：https://stackoverflow.com/questions/29184901/how-can-i-retain-numbers-for-sorting-them-later

标签

Linux

bash

shell