Algorithm - How to delete duplicate elements in a list efficiently?

前端 未结 16 2135
执念已碎
执念已碎 2020-12-01 04:21

There is a list L. It contains elements of arbitrary type each. How to delete all duplicate elements in such list efficiently? ORDE

相关标签:
16条回答
  • 2020-12-01 04:23
    • go through the list and assign sequential index to each item
    • sort the list basing on some comparison function for elements
    • remove duplicates
    • sort the list basing on assigned indices

    for simplicity indices for items may be stored in something like std::map

    looks like O(n*log n) if I haven't missed anything

    0 讨论(0)
  • 2020-12-01 04:24

    Assuming order matters:

    • Create an empty set S and an empty list M.
    • Scan the list L one element at a time.
    • If the element is in the set S, skip it.
    • Otherwise, add it to M and to S.
    • Repeat for all elements in L.
    • Return M.

    In Python:

    >>> L = [2, 1, 4, 3, 5, 1, 2, 1, 1, 6, 5]
    >>> S = set()
    >>> M = []
    >>> for e in L:
    ...     if e in S:
    ...         continue
    ...     S.add(e)
    ...     M.append(e)
    ... 
    >>> M
    [2, 1, 4, 3, 5, 6]
    

    If order does not matter:

    M = list(set(L))
    
    0 讨论(0)
  • 2020-12-01 04:25

    Generic solution close to the accepted answer

    k = ['apple', 'orange', 'orange', 'grapes', 'apple', 'apple', 'apple']
    m = []
    
    
    def remove_duplicates(k):
        for i in range(len(k)):
            for j in range(i, len(k)-1):
                if k[i] == k[j+1]:
                    m.append(j+1)
    
        l = list(dict.fromkeys(m))
        l.sort(reverse=True)
    
        for i in l:
            k.pop(i)
    
        return k
    
    
    print(remove_duplicates(k))
    
    0 讨论(0)
  • 2020-12-01 04:27

    In java, it's a one liner.

    Set set = new LinkedHashSet(list);
    

    will give you a collection with duplicate items removed.

    0 讨论(0)
  • 2020-12-01 04:27

    Delete duplicates in a list inplace in Python

    Case: Items in the list are not hashable or comparable

    That is we can't use set (dict) or sort.

    from itertools import islice
    
    def del_dups2(lst):
        """O(n**2) algorithm, O(1) in memory"""
        pos = 0
        for item in lst:
            if all(item != e for e in islice(lst, pos)):
                # we haven't seen `item` yet
                lst[pos] = item
                pos += 1
        del lst[pos:]
    

    Case: Items are hashable

    Solution is taken from here:

    def del_dups(seq):
        """O(n) algorithm, O(log(n)) in memory (in theory)."""
        seen = {}
        pos = 0
        for item in seq:
            if item not in seen:
                seen[item] = True
                seq[pos] = item
                pos += 1
        del seq[pos:]
    

    Case: Items are comparable, but not hashable

    That is we can use sort. This solution doesn't preserve original order.

    def del_dups3(lst):
        """O(n*log(n)) algorithm, O(1) memory"""
        lst.sort()
        it = iter(lst)
        for prev in it: # get the first element 
            break
        pos = 1 # start from the second element
        for item in it: 
            if item != prev: # we haven't seen `item` yet
                lst[pos] = prev = item
                pos += 1
        del lst[pos:]
    
    0 讨论(0)
  • 2020-12-01 04:29

    I've written an algorithm for string. Actually it does not matter what type do you have.

    static string removeDuplicates(string str)
    {
        if (String.IsNullOrEmpty(str) || str.Length < 2) {
            return str;
        }
    
        char[] arr = str.ToCharArray();
        int len = arr.Length;
        int pos = 1;
    
        for (int i = 1; i < len; ++i) {
    
            int j;
    
            for (j = 0; j < pos; ++j) {
                if (arr[i] == arr[j]) {
                    break;
                }
            }
    
            if (j == pos) {
                arr[pos] = arr[i];
                ++pos;
            }
        }
    
        string finalStr = String.Empty;
        foreach (char c in arr.Take(pos)) {
            finalStr += c.ToString();
        }
    
        return finalStr;
    }
    
    0 讨论(0)
提交回复
热议问题