可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

EDIT: To be clear, we got our STDOUT from a for loop that goes something like this

for (( i=1; i<="$FILE_AMOUNT"; i++ )); do     MY_FILE=`find $DIR -type f | head -$i | tail -1`     FILE_TYPE=`file -b "$MY_FILE"     FILE_TYPE_COUNT=`echo $FILE_TYPE" | sort | uniq -c`     echo "$FILE_TYPE_COUNT" done

Hence our STDOUT is basically output from file utility printed one by one, instead of it actualling being set of strings we can copy - which is likely the core behind all of the issues

So there's a pickle i absolutely can't wrap my head around.

Basically i'm creating a shellscript that will print out various filetypes we have in our directory. It pretty much works, however, for some odd reason when i try to use uniq on my output, it doesnt work. This is my output

POSIX shell script, ASCII text executable ASCII text Bourne-Again shell script, ASCII text executable UTF-8 Unicode text, with overstriking Bourne-Again shell script, ASCII text executable

Seems fairly self-explanatory, however when I use

FILE_TYPE_COUNT=`echo "$FILE_TYPE" | sort | uniq -c`

this is the result it prints

  1 POSIX shell script, ASCII text executable   1 ASCII text   1 Bourne-Again shell script, ASCII text executable   1 UTF-8 Unicode text, with overstriking   1 Bourne-Again shell script, ASCII text executable

Obviously it should be

  1 POSIX shell script, ASCII text executable   1 ASCII text   2 Bourne-Again shell script, ASCII text executable   1 UTF-8 Unicode text, with overstriking

Any idea what I'm doing wrong?

Obviously uniq thinks the lines aren't different, but that's what I assume is at fault of sort, because it cant sort my STDOUT. So any clue how to sort the list properly ALPHABETICALlY?

回答1:

Your approach seem overly complicated, try this:

find $DIR -type f -exec file -b -- {} \; | sort | uniq -c

If you'r not familiar with -exec, it executes the given command, in our case file -b -- {}, once per file. The place holder {} is replaced with the path to the file currently being processed.

Why you approach doesn't work:

You do this echo $FILE_TYPE" | sort | uniq -c within the for loop, $FILE_TYPE contains only the file type of one file at that point. You need to move the sort | uniq -c out of the loop.

I adjusted your code so it works:

declare -a TYPES=() for (( i=1; i<="$FILE_AMOUNT"; i++ )); do     MY_FILE=`find a/ -type f | head -$i | tail -1`     FILE_TYPE=`file -b "$MY_FILE"`     TYPES+=("$FILE_TYPE") # add type of current file to TYPES array done  # TYPES now contains the types of all files and we can now count them printf "%s\n" "${TYPES[@]}" | sort | uniq -c

回答2:

The issue you are seeing is because you are sorting a set of one item, for every iteration of the loop.

You'd need to sort the whole output of the loop instead.

Your (syntactically fixed) script:

for (( i=1; i<="$FILE_AMOUNT"; i++ )); do     MY_FILE=`find $DIR -type f | head -$i | tail -1`     FILE_TYPE=`file -b "$MY_FILE"`     FILE_TYPE_COUNT=`echo "$FILE_TYPE" | sort | uniq -c`     echo "$FILE_TYPE_COUNT" done

Mofified to work properly:

for (( i=1; i<="$FILE_AMOUNT"; i++ )); do     MY_FILE=`find $DIR -type f | head -$i | tail -1`     file -b "$MY_FILE" done | sort | uniq -c

Optimised once:

for FILE in $(find $DIR -type f); do     file -b "$FILE" done | sort | uniq -c

Optimised twice (See @P. Gerber's Answer):

find $DIR -type f -exec file -b -- {} \; | sort | uniq -c

Your original script is horrifically inefficient.

Notes on efficiency & operation:

${FILE_AMOUNT} has to be correct to iterate over the whole dataset
You are running find, which returns the whole dataset and then discarding everything that you're not interested in, every iteration
You are running sort and uniq, on every iteration, on a dataset of size one
As you are constantly re-computing your dataset, if it changes half way through your script's execution (e.g: file / directory is created / deleted), then your results will become invalid
Remember that every time you start a new program, you pay a performance penalty - this is exacerbated by the fact that you are continually computing your dataset and then discarding "everything that you don't want"

回答3:

In addition to the other good solutions here, be sure to understand the sorting rule set that you are using. To inspect your current sorting rule, you can do:

echo anything | sort --debug

to see your results with annotations. Consider:

echo -e "a 2\na1" | sort --debug sort: using ‘en_US.UTF-8’ sorting rules a1 __ a 2 ___

Note that the rule set is sorting with perhaps an unexpected result. If you're looking for a simple byte comparison, then be sure to set LC_ALL=C as in:

LC_ALL=C sort

For example:

echo -e "a 2\na1" | LC_ALL=C sort --debug sort: using simple byte comparison a 2 ___ a1 __

The use of LC_ALL is important in getting the results you expect. Lastly, run the locale command and read the man page to get locale-specific information.

文章来源: Any idea why sort utility gives me incorrect results?

标签

sort

脚本

uniq

ascii