问题
I have a string containing duplicate words, for example:
abc, def, abc, def
How can I remove the duplicates? The string that I need is:
abc, def
回答1:
We have this test file:
$ cat file
abc, def, abc, def
To remove duplicate words:
$ sed -r ':a; s/\b([[:alnum:]]+)\b(.*)\b\1\b/\1\2/g; ta; s/(, )+/, /g; s/, *$//' file
abc, def
How it works
:aThis defines a label
a.s/\b([[:alnum:]]+)\b(.*)\b\1\b/\1\2/gThis looks for a duplicated word consisting of alphanumeric characters and removes the second occurrence.
taIf the last substitution command resulted in a change, this jumps back to label
ato try again.In this way, the code keeps looking for duplicates until none remain.
s/(, )+/, /g; s/, *$//These two substitution commands clean up any left over comma-space combinations.
Mac OSX or other BSD System
For Mac OSX or other BSD system, try:
sed -E -e ':a' -e 's/\b([[:alnum:]]+)\b(.*)\b\1\b/\1\2/g' -e 'ta' -e 's/(, )+/, /g' -e 's/, *$//' file
Using a string instead of a file
sed easily handles input either from a file, as shown above, or from a shell string as shown below:
$ echo 'ab, cd, cd, ab, ef' | sed -r ':a; s/\b([[:alnum:]]+)\b(.*)\b\1\b/\1\2/g; ta; s/(, )+/, /g; s/, *$//'
ab, cd, ef
回答2:
You can use awk to do this.
Example:
#!/bin/bash
string="abc, def, abc, def"
string=$(printf '%s\n' "$string" | awk -v RS='[,[:space:]]+' '!a[$0]++{printf "%s%s", $0, RT}')
string="${string%,*}"
echo "$string"
Output:
abc, def
回答3:
This can also be done in pure Bash:
#!/bin/bash
string="abc, def, abc, def"
declare -A words
IFS=", "
for w in $string; do
words+=( [$w]="" )
done
echo ${!words[@]}
Output
def abc
Explanation
words is an associative array (declare -A words) and every word is added as
a key to it:
words+=( [${w}]="" )
(We do not need its value therefore I have taken "" as value).
The list of unique words is the list of keys (${!words[@]}).
There is one caveat thought, the output is not separated by ", ". (You will
have to iterate again. IFS is only used with ${words[*]} and even than only
the first character of IFS is used.)
回答4:
I have another way for this case. I changed my input string such as below and run command to editing it:
#string="abc def abc def"
$ echo "abc def abc def" | xargs -n1 | sort -u | xargs | sed "s# #, #g"
abc, def
Thanks for all support!
回答5:
The problem with an associative array or xargs and sort in the other examples is, that the words become sorted. My solution only skips words that already have been processed. The associative array map keeps this information.
Bash function
function uniq_words() {
local string="$1"
local delimiter=", "
local words=""
declare -A map
while read -r word; do
# skip already processed words
if [ ! -z "${map[$word]}" ]; then
continue
fi
# mark the found word
map[$word]=1
# don't add a delimiter, if it is the first word
if [ -z "$words" ]; then
words=$word
continue
fi
# add a delimiter and the word
words="$words$delimiter$word"
# split the string into lines so that we don't have
# to overwrite the $IFS system field separator
done <<< $(sed -e "s/$delimiter/\n/g" <<< "$string")
echo ${words}
}
Example 1
uniq_words "abc, def, abc, def"
Output:
abc, def
Example 2
uniq_words "1, 2, 3, 2, 1, 0"
Output:
1, 2, 3, 0
Example with xargs and sort
In this example, the output is sorted.
echo "1 2 3 2 1 0" | xargs -n1 | sort -u | xargs | sed "s# #, #g"
Output:
0, 1, 2, 3
来源:https://stackoverflow.com/questions/30294915/how-to-remove-duplicate-words-from-a-string-in-a-bash-script