问题
I have a problem that sounds like this: Write a shell script that for each file from the command line will output the number of words that are longer than the number k read from keyboard. The output must be ordered by the number of words.
How can i retain the number of characters of each file,for sorting them?
I tried something like that :
#!/bin/bash
if [ #@ -ne 1 ]
then exit 1
fi
array[$@]=''
echo -n "Give the number>"
read k
for f in $@;
do
n=`$f | wc -c`
if [ $n -gt $k ];
then
i++
array[i]=$n
fi
done
echo {array[@]} | sort -n
回答1:
The challenge is:
- Write a shell script that for each file from the command line will output the number of words that are longer than the number
kread from keyboard. The output must be ordered by the number of words.
I decline to answer prompts — commands take arguments. I'll go with William Pursell's suggestion that the number is the first argument — it is a reasonable solution. An alternative uses an option like -l 23 for the length (and other options to tweak other actions).
The solutions I see so far are counting the number of words, but not the number of words longer than the given length. This is a problem. For that, I think awk is appropriate:
awk -v min=$k '{ for (i = 1; i <= NF; i++) if (length($i) >= min) print $i; }'
This generates the words at least min characters one per line on the standard output. We'll do this one file at a time, at least in the first pass.
We can then count the number of such words with wc -l. Finally, we can sort the data numerically.
Putting it all together yields:
#!/bin/bash
case "$#" in
0|1) echo "Usage: $0 length file ..." >&2; exit 1;;
esac
k=${1:?"Cannot provide an empty length"}
shift
for file in "$@"
do
echo "$(awk -v min=$k '{ for (i = 1; i <= NF; i++)
if (length($i) >= min) print $i
}' "$file" |
wc -l) $file"
done | sort -n
This lists the files with the most long words last; that's convenient because the most interesting files are at the end of the list. If you want the high numbers first, add -r to the sort.
Of course, if we're using awk, we can improve things. It can count the number of long words in each file, and print the file name and the number, so there'd be just a single invocation of awk for all the files. It takes a little bit more programming, though:
#!/bin/sh
case "$#" in
0|1) echo "Usage: $0 length file ..." >&2; exit 1;;
esac
k=${1:?"Cannot provide an empty length"}
shift
awk -v min=$k '
FILENAME != oldfile { if (oldfile != "") { print longwords, oldfile }
oldfile = FILENAME; longwords = 0
}
{ for (i = 1; i <= NF; i++) if (length($i) >= min) longwords++ }
END { if (oldfile != "") { print longwords, oldfile } }
' "$@" |
sort -n
If you have GNU awk, there are even ways to sort the results built into awk.
回答2:
You can simplify the script a bit:
#!/bin/bash
(( $# > 0 )) || exit
read -r -p 'Enter number > ' k
wc -w "$@" | sed '$d' | gawk -v k="$k" '$1>k{print $0}' | sort -nr
where
read -r -p ...prompts and read the inputwc -w- counts the words of all files what you entered as argumentssed ...- skips the last line(total...)awkskips lines where count is less than$ksort- for sorting the output
With the great help of @Tom Fench here it can be simplified to:
wc -w "$@" | awk -v k="$k" 'NR>1&&p>k{print p}{p=$1}' | sort -nr
or with filenames (based on @Wintermute's comment here)
wc -w "$@" | awk -v k="$k" 'p { print p; p="" } $1 > k { p = $0 }' | sort -nr
EDIT
Based on @Jonathan Leffler's comment adding a variant for for counting words what are longer as number k in each file.
#!/bin/bash
(( $# > 0 )) || exit
read -r -p 'Enter number > ' k
let k++
grep -HoP "\b\w{${k:-3},}\b" "$@" |\
awk -F: '{f[$1]++}END{for(n in f)print f[n],n}' |\
sort -nr
Where:
- the
grep...searches for the words what are longer as the entered number (omit theletline if want equal and longer). prints out lines like:
file1:word1
file1:word2
...
file2:wordx
file2:wordy
- and the awk count the frequency based on the 1st field, e.g. count by filename.
来源:https://stackoverflow.com/questions/29184901/how-can-i-retain-numbers-for-sorting-them-later