问题
I have a problem that sounds like this: Write a shell script that for each file from the command line will output the number of words that are longer than the number k read from keyboard. The output must be ordered by the number of words.
How can i retain the number of characters of each file,for sorting them?
I tried something like that :
#!/bin/bash
if [ #@ -ne 1 ]
then exit 1
fi
array[$@]=''
echo -n "Give the number>"
read k
for f in $@;
do
n=`$f | wc -c`
if [ $n -gt $k ];
then
i++
array[i]=$n
fi
done
echo {array[@]} | sort -n
回答1:
The challenge is:
- Write a shell script that for each file from the command line will output the number of words that are longer than the number
k
read from keyboard. The output must be ordered by the number of words.
I decline to answer prompts — commands take arguments. I'll go with William Pursell's suggestion that the number is the first argument — it is a reasonable solution. An alternative uses an option like -l 23
for the length (and other options to tweak other actions).
The solutions I see so far are counting the number of words, but not the number of words longer than the given length. This is a problem. For that, I think awk
is appropriate:
awk -v min=$k '{ for (i = 1; i <= NF; i++) if (length($i) >= min) print $i; }'
This generates the words at least min
characters one per line on the standard output. We'll do this one file at a time, at least in the first pass.
We can then count the number of such words with wc -l
. Finally, we can sort the data numerically.
Putting it all together yields:
#!/bin/bash
case "$#" in
0|1) echo "Usage: $0 length file ..." >&2; exit 1;;
esac
k=${1:?"Cannot provide an empty length"}
shift
for file in "$@"
do
echo "$(awk -v min=$k '{ for (i = 1; i <= NF; i++)
if (length($i) >= min) print $i
}' "$file" |
wc -l) $file"
done | sort -n
This lists the files with the most long words last; that's convenient because the most interesting files are at the end of the list. If you want the high numbers first, add -r
to the sort
.
Of course, if we're using awk
, we can improve things. It can count the number of long words in each file, and print the file name and the number, so there'd be just a single invocation of awk
for all the files. It takes a little bit more programming, though:
#!/bin/sh
case "$#" in
0|1) echo "Usage: $0 length file ..." >&2; exit 1;;
esac
k=${1:?"Cannot provide an empty length"}
shift
awk -v min=$k '
FILENAME != oldfile { if (oldfile != "") { print longwords, oldfile }
oldfile = FILENAME; longwords = 0
}
{ for (i = 1; i <= NF; i++) if (length($i) >= min) longwords++ }
END { if (oldfile != "") { print longwords, oldfile } }
' "$@" |
sort -n
If you have GNU awk
, there are even ways to sort the results built into awk
.
回答2:
You can simplify the script a bit:
#!/bin/bash
(( $# > 0 )) || exit
read -r -p 'Enter number > ' k
wc -w "$@" | sed '$d' | gawk -v k="$k" '$1>k{print $0}' | sort -nr
where
read -r -p ...
prompts and read the inputwc -w
- counts the words of all files what you entered as argumentssed ...
- skips the last line(total...)
awk
skips lines where count is less than$k
sort
- for sorting the output
With the great help of @Tom Fench here it can be simplified to:
wc -w "$@" | awk -v k="$k" 'NR>1&&p>k{print p}{p=$1}' | sort -nr
or with filenames (based on @Wintermute's comment here)
wc -w "$@" | awk -v k="$k" 'p { print p; p="" } $1 > k { p = $0 }' | sort -nr
EDIT
Based on @Jonathan Leffler's comment adding a variant for for counting words what are longer as number k
in each file.
#!/bin/bash
(( $# > 0 )) || exit
read -r -p 'Enter number > ' k
let k++
grep -HoP "\b\w{${k:-3},}\b" "$@" |\
awk -F: '{f[$1]++}END{for(n in f)print f[n],n}' |\
sort -nr
Where:
- the
grep...
searches for the words what are longer as the entered number (omit thelet
line if want equal and longer). prints out lines like:
file1:word1
file1:word2
...
file2:wordx
file2:wordy
- and the awk count the frequency based on the 1st field, e.g. count by filename.
来源:https://stackoverflow.com/questions/29184901/how-can-i-retain-numbers-for-sorting-them-later