search duplicate element array

问题

This one work:

arr[0]="XX1 1"
arr[1]="XX2 2" 
arr[2]="XX3 3"
arr[3]="XX4 4"
arr[4]="XX5 5"
arr[5]="XX1 1"
arr[6]="XX7 7"
arr[7]="XX8 8"

duplicate() { printf '%s\n' "${arr[@]}" | sort -cu |& awk -F: '{ print $5 }'; }

duplicate_match=$(duplicate)

echo "array: ${arr[@]}"

# echo "duplicate: $duplicate_match"

[[ ! $duplicate_match ]] || { echo "Found duplicate:$duplicate_match"; exit 0; }

echo "no duplicate"

with same code, this one doesn't work, why ?

arr[0]="XX"
arr[1]="wXyz" 
arr[2]="ABC"
arr[3]="XX"

回答1:

To check duplicates this code is much simpler and works in both cases:

uniqueNum=$(printf '%s\n' "${arr[@]}"|awk '!($0 in seen){seen[$0];c++} END {print c}')

(( uniqueNum != ${#arr[@]} )) && echo "Found duplicates"

EDIT: To print duplicates use this awk:

printf '%s\n' "${arr[@]}"|awk '!($0 in seen){seen[$0];next} 1'

Awk command stores in an array seen if a line isn't already part of seen array and next move to the next line. 1 in the end prints only those lines that are duplicates.

回答2:

Slightly silly solution here. I just wanted to see if I could do this in a single command without explicit pipes. (I think for very large arrays/array elements, explicit pipes might actually be more efficient.)

Note that this is a test for the presence of duplicate array elements, and doesn't output the duplicates themselves, although the awk command on its own will do that. Also note that if you're unlucky enough to have array elements that contain spaces, the below won't evaluate as described.

[[ $( awk -v RS=" " ' a[$0]++ ' <<< "${arr[@]} " ) ]] && echo "dups found"

Explanation:

awk -v RS=" "

do the subsequent awk command on each input record with space as the record separator. Basically, this will make awk treat each array element as a separate "line".

' a[$0]++ '

awk command that does two things:
- return at the value at key $0 in array a. If this is greater than 0, print the line. Compare to awk ' { $1=$2 } 1 '
- Add 1 to the value at key $0 in array a.

<<< "${arr[@]} "

as the input of the awk command, use the string created when you print each element in arr as a separate word, i.e. separated by space PLUS AN ADDITIONAL SPACE AT THE END.
The space between } and " is actually really important, because without it the final array element will not have a space after it and therefore will not be counted as a distinct "record" by awk.

[[ $( ... ) ]]

If the containing awk command gives any output at all, the test evaluates to 0, i.e. TRUE.

来源：https://stackoverflow.com/questions/22055238/search-duplicate-element-array

标签

arrays

bash

duplicates