There is a list L. It contains elements of arbitrary type each. How to delete all duplicate elements in such list efficiently? ORDE
for simplicity indices for items may be stored in something like std::map
looks like O(n*log n) if I haven't missed anything
Assuming order matters:
In Python:
>>> L = [2, 1, 4, 3, 5, 1, 2, 1, 1, 6, 5]
>>> S = set()
>>> M = []
>>> for e in L:
... if e in S:
... continue
... S.add(e)
... M.append(e)
...
>>> M
[2, 1, 4, 3, 5, 6]
If order does not matter:
M = list(set(L))
Generic solution close to the accepted answer
k = ['apple', 'orange', 'orange', 'grapes', 'apple', 'apple', 'apple']
m = []
def remove_duplicates(k):
for i in range(len(k)):
for j in range(i, len(k)-1):
if k[i] == k[j+1]:
m.append(j+1)
l = list(dict.fromkeys(m))
l.sort(reverse=True)
for i in l:
k.pop(i)
return k
print(remove_duplicates(k))
In java, it's a one liner.
Set set = new LinkedHashSet(list);
will give you a collection with duplicate items removed.
That is we can't use set
(dict
) or sort
.
from itertools import islice
def del_dups2(lst):
"""O(n**2) algorithm, O(1) in memory"""
pos = 0
for item in lst:
if all(item != e for e in islice(lst, pos)):
# we haven't seen `item` yet
lst[pos] = item
pos += 1
del lst[pos:]
Solution is taken from here:
def del_dups(seq):
"""O(n) algorithm, O(log(n)) in memory (in theory)."""
seen = {}
pos = 0
for item in seq:
if item not in seen:
seen[item] = True
seq[pos] = item
pos += 1
del seq[pos:]
That is we can use sort
. This solution doesn't preserve original order.
def del_dups3(lst):
"""O(n*log(n)) algorithm, O(1) memory"""
lst.sort()
it = iter(lst)
for prev in it: # get the first element
break
pos = 1 # start from the second element
for item in it:
if item != prev: # we haven't seen `item` yet
lst[pos] = prev = item
pos += 1
del lst[pos:]
I've written an algorithm for string. Actually it does not matter what type do you have.
static string removeDuplicates(string str)
{
if (String.IsNullOrEmpty(str) || str.Length < 2) {
return str;
}
char[] arr = str.ToCharArray();
int len = arr.Length;
int pos = 1;
for (int i = 1; i < len; ++i) {
int j;
for (j = 0; j < pos; ++j) {
if (arr[i] == arr[j]) {
break;
}
}
if (j == pos) {
arr[pos] = arr[i];
++pos;
}
}
string finalStr = String.Empty;
foreach (char c in arr.Take(pos)) {
finalStr += c.ToString();
}
return finalStr;
}