ConcurrentDictionary with multiple values per key, removing empty entries

本秂侑毒 提交于 2020-05-14 14:18:09

问题


ConcurrentDictionary works well for concurrent situations when mapping keys to a single value each. When mapping to multiple values, it is easy to create a ConcurrentDictionary<K, List<V>> and guard its addition/removal functions.

ConcurrentDictionary <string, List<string>> d;

// Add
var list = d.GetOrAdd ("key", x => new List<string> ());
lock (list) {
    list.Add ("value to add");
}

// Remove
if (d.TryGetValue ("key", out var list)) {
    lock (list) {
        list.Remove ("value to remove");
    }
}

However, the above assumed that empty lists are allowed to stay. I don't want that. But removing empty pairs does not seem to be possible in an atomic fashion. One might try:

if (d.TryGetValue ("key", out var list)) {
    lock (list) {
        if (list.Remove ("value to remove") && list.Count == 0) {
            d.TryRemove ("key", out _);
        }
    }
}

But this has a race condition when another thread grabs the list before but adds to it after it was emptied and removed elsewhere:

  • A: get list
  • B: get list
  • B: lock, remove from list
  • B: list is empty, delete key, unlock
  • A: lock, add to list, unlock

Locking on the dictionary is not possible (it's a different use case).

As far as I can tell, a solution would usually be found using compare-and-swap operations and replacing the list with e.g. an immutable array that is then replaced in its entirety. However, given that ConcurrentDictionary does not offer a TryRemove with an expected value to compare against, I don't quite see how. Possibly there is a two-stage solution?

Using the out parameter of TryRemove to add values again after removing them (to fix race cases) is not possible - the dictionary would briefly be in an inconsistent state.

There are many questions on this site asking about similar scenarios, but most of them suffer from trivial mistakes or do not remove empty entries. There is this highly related question which asks if it is possible to do this. Sadly, it is five years old, received very little attention and has no solution apart from resorting to locks (which defeats the purpose). Possibly there opened up a better way since that time.


(Edited example for clarity)


回答1:


I managed to implement a ConcurrentMultiDictionary class that stores multiple values per key, and with empty entries removed. The values of each key are stored in a HashSet, so each key has unique values. This increases the performance of deleting a value when the number of values is large. If the uniqueness is a problem then the HashSet should be replaced with a List, and the Add method should be modified to return void instead of bool.

The atomicity of the adding and removing operations is achieved by spinning. When a bag of values becomes empty, it is flagged as "discarded". Adding values into a discarded bag is not allowed, so the Add operation spins until it grabs a non discarded bag. The Remove operation spins too. So the only thread that is allowed to remove a discarded bag is the same thread that marked the bag as discarded. All other threads will be spinning until that happens. SpinWait structs are used for the spinning, to ensure efficiency even in single processor machines.

An unsolvable problem of this implementation is how to implement a ToArray method that takes a snapshot of all keys and values stored in the dictionary. The ConcurrentDictionary.ToArray method returns a snapshot of the keys, but the bags can be constantly changing, and this is why I believe it is unsolvable.

Even implementing the IEnumerable interface is a bit tricky, because if we just enumerate the KeyValuePairs of the underlying dictionary, most of the bags could be discarded at the time their values are acquired. So during the enumeration the bag of each key is retrieved individually, to be as current as possible.

public class ConcurrentMultiDictionary<TKey, TValue>
    : IEnumerable<KeyValuePair<TKey, TValue[]>>
{
    private class Bag : HashSet<TValue>
    {
        public bool IsDiscarded { get; set; }
    }

    private readonly ConcurrentDictionary<TKey, Bag> _dictionary;

    public ConcurrentMultiDictionary()
    {
        _dictionary = new ConcurrentDictionary<TKey, Bag>();
    }

    public int Count => _dictionary.Count;

    public bool Add(TKey key, TValue value)
    {
        var spinWait = new SpinWait();
        while (true)
        {
            var bag = _dictionary.GetOrAdd(key, _ => new Bag());
            lock (bag)
            {
                if (!bag.IsDiscarded) return bag.Add(value);
            }
            spinWait.SpinOnce();
        }
    }

    public bool Remove(TKey key, TValue value)
    {
        var spinWait = new SpinWait();
        while (true)
        {
            if (!_dictionary.TryGetValue(key, out var bag)) return false;
            bool spinAndRetry = false;
            lock (bag)
            {
                if (bag.IsDiscarded)
                {
                    spinAndRetry = true;
                }
                else
                {
                    bool valueRemoved = bag.Remove(value);
                    if (!valueRemoved) return false;
                    if (bag.Count != 0) return true;
                    bag.IsDiscarded = true;
                }
            }
            if (spinAndRetry) { spinWait.SpinOnce(); continue; }
            bool keyRemoved = _dictionary.TryRemove(key, out var currentBag);
            Debug.Assert(keyRemoved, $"Key {key} was not removed");
            Debug.Assert(bag == currentBag, $"Removed wrong bag");
            return true;
        }
    }

    public bool TryGetValues(TKey key, out TValue[] values)
    {
        if (!_dictionary.TryGetValue(key, out var bag)) { values = null; return false; }
        bool isDiscarded;
        lock (bag) { isDiscarded = bag.IsDiscarded; values = bag.ToArray(); }
        if (isDiscarded) { values = null; return false; }
        return true;
    }

    public bool Contains(TKey key, TValue value)
    {
        if (!_dictionary.TryGetValue(key, out var bag)) return false;
        lock (bag) return !bag.IsDiscarded && bag.Contains(value);
    }

    public bool ContainsKey(TKey key) => _dictionary.ContainsKey(key);

    public ICollection<TKey> Keys => _dictionary.Keys;

    public IEnumerator<KeyValuePair<TKey, TValue[]>> GetEnumerator()
    {
        foreach (var key in _dictionary.Keys)
        {
            if (this.TryGetValues(key, out var values))
            {
                yield return new KeyValuePair<TKey, TValue[]>(key, values);
            }
        }
    }

    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}

This implementation was tested with 8 concurrent workers mutating a dictionary a million times per worker, and no inconsistency regarding the reported number of additions and removals was observed.




回答2:


There seems to be no practical way of removing an empty collection (even if it is synchronized) from a concurrent dictionary without having race condition issues. There are certain facts preventing this from being possible, as discussed in the comments under both the question and the OP's self answer.

What I wrote in my comment, however, seemed feasible and I wanted to give it a try.

I want to discuss the drawbacks of this implementation right after, and I should also say that your comments (if received any) are what is most valuable to me.

First, the usage:

    static void Main(string[] args)
    {
        var myDictionary = new ConcurrentDictionary<string, IList<int>>();
        IList<int> myList = myDictionary.AddSelfRemovingList<string, int>("myList");

        myList.Add(5);
        myList.Add(6);

        myList.Remove(6);
        myList.Remove(5);

        IList<int> existingInstance;

        // returns false:
        bool exists = myDictionary.TryGetValue("myList", out existingInstance);

        // throws HasAlreadyRemovedSelfException:
        myList.Add(3);
    }

AddSelfRemovingList is an extension method to make things easier.

For the discussion part:

  1. It is not acceptable for the removal of an item from a collection to have a side effect of removing the collection reference from the owning dictionary.
  2. It is also not good practice to make the collection obsolete (unusable) when all its items are removed. There is a strong possibility that the consumer of the collection wants to clear and re-fill the collection and this implementation does not allow that.
  3. It forces the use of IList<T> abstraction and a custom implementation over List<T>

Although this provides a real thread-safe way of removing a just emptied collection from the dictionary, there seem to be more cons than pros to it. This should only be used in a closed context where the collections inside the concurrent dictionary are exposed to the outside, and where the immediate removal of a collection when emptied, even if some other thread is accessing it at the moment, is essential.

Here is the extension method to create and add the self removing list to the dictionary:

public static class ConcurrentDictionaryExtensions
{
    public static IList<TValue> AddSelfRemovingList<TKey, TValue>(this ConcurrentDictionary<TKey, IList<TValue>> dictionaryInstance, TKey key)
    {
        var newInstance = new SelfRemovingConcurrentList<TKey, TValue>(dictionaryInstance, key);
        if (!dictionaryInstance.TryAdd(key, newInstance))
        {
            throw new ArgumentException("ownerAccessKey", "The passed ownerAccessKey has already exist in the parent dictionary");
        }
        return newInstance;
    }
}

And finally; here is the synchronized, self-removing implementation of IList<T>:

public class SelfRemovingConcurrentList<TKey, TValue> : IList<TValue>
{
    private ConcurrentDictionary<TKey, IList<TValue>> owner;
    private TKey ownerAccessKey;
    List<TValue> underlyingList = new List<TValue>();
    private bool hasRemovedSelf;

    public class HasAlreadyRemovedSelfException : Exception
    {

    }

    internal SelfRemovingConcurrentList(ConcurrentDictionary<TKey, IList<TValue>> owner, TKey ownerAccessKey)
    {
        this.owner = owner;
        this.ownerAccessKey = ownerAccessKey;
    }

    private void ThrowIfHasAlreadyRemovedSelf()
    {
        if (hasRemovedSelf)
        {
            throw new HasAlreadyRemovedSelfException();
        }
    }

    [MethodImpl(MethodImplOptions.Synchronized)]
    int IList<TValue>.IndexOf(TValue item)
    {
        ThrowIfHasAlreadyRemovedSelf();
        return underlyingList.IndexOf(item);
    }

    [MethodImpl(MethodImplOptions.Synchronized)]
    void IList<TValue>.Insert(int index, TValue item)
    {
        ThrowIfHasAlreadyRemovedSelf();
        underlyingList.Insert(index, item);
    }

    [MethodImpl(MethodImplOptions.Synchronized)]
    void IList<TValue>.RemoveAt(int index)
    {
        ThrowIfHasAlreadyRemovedSelf();
        underlyingList.RemoveAt(index);
        if (underlyingList.Count == 0)
        {
            hasRemovedSelf = true;
            IList<TValue> removedInstance;
            if (!owner.TryRemove(ownerAccessKey, out removedInstance))
            {
                // Just ignore.
                // What we want to do is to remove ourself from the owner (concurrent dictionary)
                // and it seems like we have already been removed!
            }
        }
    }

    TValue IList<TValue>.this[int index]
    {
        [MethodImpl(MethodImplOptions.Synchronized)]
        get
        {
            ThrowIfHasAlreadyRemovedSelf();
            return underlyingList[index];
        }
        [MethodImpl(MethodImplOptions.Synchronized)]
        set
        {
            ThrowIfHasAlreadyRemovedSelf();
            underlyingList[index] = value;
        }
    }

    [MethodImpl(MethodImplOptions.Synchronized)]
    void ICollection<TValue>.Add(TValue item)
    {
        ThrowIfHasAlreadyRemovedSelf();
        underlyingList.Add(item);
    }

    [MethodImpl(MethodImplOptions.Synchronized)]
    void ICollection<TValue>.Clear()
    {
        ThrowIfHasAlreadyRemovedSelf();
        underlyingList.Clear();
        hasRemovedSelf = true;
        IList<TValue> removedInstance;
        if (!owner.TryRemove(ownerAccessKey, out removedInstance))
        {
            // Just ignore.
            // What we want to do is to remove ourself from the owner (concurrent dictionary)
            // and it seems like we have already been removed!
        }
    }

    [MethodImpl(MethodImplOptions.Synchronized)]
    bool ICollection<TValue>.Contains(TValue item)
    {
        ThrowIfHasAlreadyRemovedSelf();
        return underlyingList.Contains(item);
    }

    [MethodImpl(MethodImplOptions.Synchronized)]
    void ICollection<TValue>.CopyTo(TValue[] array, int arrayIndex)
    {
        ThrowIfHasAlreadyRemovedSelf();
        underlyingList.CopyTo(array, arrayIndex);
    }

    int ICollection<TValue>.Count
    {
        [MethodImpl(MethodImplOptions.Synchronized)]
        get
        {
            ThrowIfHasAlreadyRemovedSelf();
            return underlyingList.Count;
        }
    }

    bool ICollection<TValue>.IsReadOnly
    {
        [MethodImpl(MethodImplOptions.Synchronized)]
        get
        {
            ThrowIfHasAlreadyRemovedSelf();
            return false;
        }
    }

    [MethodImpl(MethodImplOptions.Synchronized)]
    bool ICollection<TValue>.Remove(TValue item)
    {
        ThrowIfHasAlreadyRemovedSelf();
        bool removalResult = underlyingList.Remove(item);
        if (underlyingList.Count == 0)
        {
            hasRemovedSelf = true;
            IList<TValue> removedInstance;
            if (!owner.TryRemove(ownerAccessKey, out removedInstance))
            {
                // Just ignore.
                // What we want to do is to remove ourself from the owner (concurrent dictionary)
                // and it seems like we have already been removed!
            }
        }
        return removalResult;
    }

    [MethodImpl(MethodImplOptions.Synchronized)]
    IEnumerator<TValue> IEnumerable<TValue>.GetEnumerator()
    {
        ThrowIfHasAlreadyRemovedSelf();
        return underlyingList.GetEnumerator();
    }

    [MethodImpl(MethodImplOptions.Synchronized)]
    IEnumerator IEnumerable.GetEnumerator()
    {
        ThrowIfHasAlreadyRemovedSelf();
        return underlyingList.GetEnumerator();
    }
}



回答3:


The question can be solved by using a dictionary that offers a variant of TryRemove that first checks that the current value is equal to an expected value. Only if the values compare equal, the value is replaced (atomically). Otherwise, the operation returns failure.

It turns out ConcurrentDictionary already implements exactly this functionality:

/// <summary>
/// Removes the specified key from the dictionary if it exists and returns its associated value.
/// If matchValue flag is set, the key will be removed only if is associated with a particular
/// value.
/// </summary>
/// <param name="key">The key to search for and remove if it exists.</param>
/// <param name="value">The variable into which the removed value, if found, is stored.</param>
/// <param name="matchValue">Whether removal of the key is conditional on its value.</param>
/// <param name="oldValue">The conditional value to compare against if <paramref name="matchValue"/> is true</param>
/// <returns></returns>
private bool TryRemoveInternal(TKey key, out TValue value, bool matchValue, TValue oldValue)

TryRemove calls this (with matchValue set to false). The method is sadly not exposed (it is private). A simple solution would thus be to copy the existing class and change this method to be public. I'm not sure why it was not exposed. If the specific functionality were not working well, matchValue would most likely have been removed earlier.

As @Theodor Zoulias notes, it is also possible to invoke the private TryRemoveInternal method by using reflection. As far as I know, this can be done without major impact on performence.

There are also third party implementations with (claimed) high performance and concurrency that exhibit a TryRemove (..., expectedValue).

Once an implementation is chosen, the following code implements the asked for functionality. It uses the atomic compare-and-swap operations provided by the dictionary in a loop until it succeeds (similar to what many concurrent dictionaries do internally, too). As far as I'm aware, this is a typical approach in lock-free algorithms.

// Use any third-party dictionary that offers TryRemove() with
// a value to compare against (two are mentioned above)
ConcurrentDictionary<TKey, List<TValue>> d;
...

// To remove a value from key:
// Loop until the compare-and-swap of either update or removal succeeded
while (true)
{
    // If the key does not exist, exit
    if (!d.TryGetValue (key, out var list)) {
        break;
    }

    // Remove the value from this key's entry:
    // Consider the old value immutable, copy-and-modify it instead
    List<TValue> newlist;
    lock (list) {
        newlist = list.Where (it => it != valueToRemove).ToList ();
    }

    // If the value list is not empty, compare-and-update it
    if (newlist.Count > 0) {
        if (d.TryUpdate (key: key, newValue: newlist, expectedValue: list)) {
            return;
        }
    }
    else // The key's value list is empty - compare-and-remove the entire key
    {
        // Remove the key iff the associated value is still the same
        if (d.TryRemove (key: key, expectedValue: list)) { // Note that list is an in-, not an out-parameter
            return;
        }
    }

    // If we reach this point, the operation failed - try again
}


来源:https://stackoverflow.com/questions/60695167/concurrentdictionary-with-multiple-values-per-key-removing-empty-entries

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!