可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
EDIT: To be clear, we got our STDOUT from a for loop that goes something like this
for (( i=1; i<="$FILE_AMOUNT"; i++ )); do MY_FILE=`find $DIR -type f | head -$i | tail -1` FILE_TYPE=`file -b "$MY_FILE" FILE_TYPE_COUNT=`echo $FILE_TYPE" | sort | uniq -c` echo "$FILE_TYPE_COUNT" done
Hence our STDOUT is basically output from file utility printed one by one, instead of it actualling being set of strings we can copy - which is likely the core behind all of the issues
`
So there's a pickle i absolutely can't wrap my head around.
Basically i'm creating a shellscript that will print out various filetypes we have in our directory. It pretty much works, however, for some odd reason when i try to use uniq on my output, it doesnt work. This is my output
POSIX shell script, ASCII text executable ASCII text Bourne-Again shell script, ASCII text executable UTF-8 Unicode text, with overstriking Bourne-Again shell script, ASCII text executable
Seems fairly self-explanatory, however when I use
FILE_TYPE_COUNT=`echo "$FILE_TYPE" | sort | uniq -c`
this is the result it prints
1 POSIX shell script, ASCII text executable 1 ASCII text 1 Bourne-Again shell script, ASCII text executable 1 UTF-8 Unicode text, with overstriking 1 Bourne-Again shell script, ASCII text executable
Obviously it should be
1 POSIX shell script, ASCII text executable 1 ASCII text 2 Bourne-Again shell script, ASCII text executable 1 UTF-8 Unicode text, with overstriking
Any idea what I'm doing wrong?
Obviously uniq thinks the lines aren't different, but that's what I assume is at fault of sort, because it cant sort my STDOUT. So any clue how to sort the list properly ALPHABETICALlY?
回答1:
Your approach seem overly complicated, try this:
find $DIR -type f -exec file -b -- {} \; | sort | uniq -c
If you'r not familiar with -exec
, it executes the given command, in our case file -b -- {}
, once per file. The place holder {}
is replaced with the path to the file currently being processed.
Why you approach doesn't work:
You do this echo $FILE_TYPE" | sort | uniq -c
within the for loop, $FILE_TYPE
contains only the file type of one file at that point. You need to move the sort | uniq -c
out of the loop.
I adjusted your code so it works:
declare -a TYPES=() for (( i=1; i<="$FILE_AMOUNT"; i++ )); do MY_FILE=`find a/ -type f | head -$i | tail -1` FILE_TYPE=`file -b "$MY_FILE"` TYPES+=("$FILE_TYPE") # add type of current file to TYPES array done # TYPES now contains the types of all files and we can now count them printf "%s\n" "${TYPES[@]}" | sort | uniq -c
回答2:
The issue you are seeing is because you are sorting a set of one item, for every iteration of the loop.
You'd need to sort the whole output of the loop instead.
Your (syntactically fixed) script:
for (( i=1; i<="$FILE_AMOUNT"; i++ )); do MY_FILE=`find $DIR -type f | head -$i | tail -1` FILE_TYPE=`file -b "$MY_FILE"` FILE_TYPE_COUNT=`echo "$FILE_TYPE" | sort | uniq -c` echo "$FILE_TYPE_COUNT" done
Mofified to work properly:
for (( i=1; i<="$FILE_AMOUNT"; i++ )); do MY_FILE=`find $DIR -type f | head -$i | tail -1` file -b "$MY_FILE" done | sort | uniq -c
Optimised once:
for FILE in $(find $DIR -type f); do file -b "$FILE" done | sort | uniq -c
Optimised twice (See @P. Gerber's Answer):
find $DIR -type f -exec file -b -- {} \; | sort | uniq -c
Your original script is horrifically inefficient.
Notes on efficiency & operation:
${FILE_AMOUNT}
has to be correct to iterate over the whole dataset - You are running
find
, which returns the whole dataset and then discarding everything that you're not interested in, every iteration - You are running
sort
and uniq
, on every iteration, on a dataset of size one - As you are constantly re-computing your dataset, if it changes half way through your script's execution (e.g: file / directory is created / deleted), then your results will become invalid
- Remember that every time you start a new program, you pay a performance penalty - this is exacerbated by the fact that you are continually computing your dataset and then discarding "everything that you don't want"
回答3:
In addition to the other good solutions here, be sure to understand the sorting rule set that you are using. To inspect your current sorting rule, you can do:
echo anything | sort --debug
to see your results with annotations. Consider:
echo -e "a 2\na1" | sort --debug sort: using ‘en_US.UTF-8’ sorting rules a1 __ a 2 ___
Note that the rule set is sorting with perhaps an unexpected result. If you're looking for a simple byte comparison, then be sure to set LC_ALL=C
as in:
LC_ALL=C sort
For example:
echo -e "a 2\na1" | LC_ALL=C sort --debug sort: using simple byte comparison a 2 ___ a1 __
The use of LC_ALL
is important in getting the results you expect. Lastly, run the locale
command and read the man page to get locale-specific information.