Compacting a WeakReference Dictionary

I've got a class Foo with a property Id. My goal is that there are no two instances of Foo with the same Id at the same time.

So I created a factory method CreateFoo which uses a cache in order to return the same instance for the same Id.

static Foo CreateFoo(int id) {
    Foo foo;
    if (!cache.TryGetValue(id, out foo)) {
        foo = new Foo(id);
        foo.Initialize(...);
        cache.Put(id, foo);
    }
    return foo;
}

The cache is implemented as a Dictionary<TKey,WeakReference>, based on @JaredPar's Building a WeakReference Hashtable:

class WeakDictionary<TKey, TValue> where TValue : class {
    private readonly Dictionary<TKey, WeakReference> items;
    public WeakDictionary() {
        this.items = new Dictionary<TKey, WeakReference>();
    }
    public void Put(TKey key, TValue value) {
        this.items[key] = new WeakReference(value);
    }
    public bool TryGetValue(TKey key, out TValue value) {
        WeakReference weakRef;
        if (!this.items.TryGetValue(key, out weakRef)) {
            value = null;
            return false;
        } else {
            value = (TValue)weakRef.Target;
            return (value != null);
        }
    }
}

The problem is that the WeakReferences remain in the dictionary after their targets have been garbage collected. This implies the need for some strategy how to manually "garbage collect" dead WeakReferences, as explained by @Pascal Cuoq in What happens to a WeakReference after GC of WeakReference.Target.

My question is: What's the best strategy to compact a WeakReference Dictionary?

The options that I see are:

Don't remove WeakReferences from the Dictionary. IMO this is bad, because the cache is used in the full lifetime of my application, and a lot of dead WeakReferences will accumulate over time.
Walk the entire dictionary on each Put and TryGetValue, and remove dead WeakReferences. This defeats somewhat the purpose of a dictionary because both operations become O(n).
Walk the entire dictionary periodically in a background thread. What would be a good interval, given that I don't know the usage pattern of CreateFoo?
Append each inserted KeyValuePair to a double-ended linked list. Each call to Put and TryGetValue examines the head of the list. If the WeakReference is alive, move the pair to the end of the list. If it is dead, remove the pair from the list and the WeakReference from the Dictionary.
Implement a custom hash table with the minor difference that, when a bucket is full, dead WeakReferences are first removed from the bucket before proceeding as usual.

Are there other strategies?

The best strategy is probably an algorithm with amortized time complexity. Does such a strategy exist?

Your Option 3 (a Thread) has the big disadvantage of making synchronization necessary on all Put/TryGetvalue actions. If you do use this, your interval is not in milliseconds but every N TryGet actions.

Option 2, scanning the Dictionary, would incur a serious overhead. You can improve by only scanning 1 in 1000 actions and/or by watching how often the GC has run.

But i would seriously consider option 1: Do nothing. You may have "a lot" of dead entries but on the other hand they are pretty small (and get recycled). Probably not an option for a Server App but for a Client application I would try to get a measure on how many entries (kByte) per hour we are talking about.

After some discussion:

Does such a[n amortized] strategy exist?

I would guess no. Your problem is a miniature version of the GC. You will have to scan the whole thing once in a while. So only options 2) and 3) provide a real solution. And they are both expensive but they can be (heavily) optimized with some heuristics. Option 2) would still give you the occasional worst-case though.

If you can switch the managed object to be the key of the dictionary, then you can use .Net 4.0's ConditionalWeakTable (namespace System.Runtime.CompilerServices).

According to Mr. Richter, ConditionalWeakTable is notified of object collection by the garbage collector rather than using a polling thread.

    static ConditionalWeakTable<TabItem, TIDExec> tidByTab = new ConditionalWeakTable<TabItem, TIDExec>();

    void Window_Loaded(object sender, RoutedEventArgs e)
    {
        ...
        dataGrid.SelectionChanged += (_sender, _e) =>
        {
            var cs = dataGrid.SelectedItem as ClientSession;

            this.tabControl.Items.Clear();

            foreach (var tid in cs.GetThreadIDs())
            {
                tid.tabItem = new TabItem() { Header = ... };
                tid.tabItem.AddHandler(UIElement.MouseDownEvent,
                    new MouseButtonEventHandler((__sender, __e) =>
                    {
                        tabControl_SelectionChanged(tid.tabItem);
                    }), true);
                tidByTab.Add(tid.tabItem, tid);
                this.tabControl.Items.Add(tid.tabItem);
            }
        };
    }

    void tabControl_SelectionChanged(TabItem tabItem)
    {
        this.tabControl.SelectedItem = tabItem;
        if (tidByTab.TryGetValue(tabControl.SelectedItem as TabItem, out tidExec))
        {
            tidExec.EnsureBlocksLoaded();
            ShowStmt(tidExec.CurrentStmt);
        }
        else
            throw new Exception("huh?");
    }

What's important here is that the only thing referencing the TabItem object is the tabControls.Items collection, and the key of ConditionalWeakTable. The key of ConditionalWeakTable does not count. So when we clear all the items from the tabControl, then those TabItems can be garbage-collected (because nothing references them any longer, again the key of ConditionalWeakTable does not count). When they are garabage collected, ConditionalWeakTable is notified and the entry with that key value is removed. So my bulky TIDExec objects are also garbage-collected at that point (nothing references them, except the value of ConditionalWeakTable).

Approach #5 is interesting, but has the disadvantage that it could be difficult to know what the real level of hash-table utilization is, and consequently when the hash table should be expanded. That difficulty might be overcome if, whenever it "seems" like the hash table should be expanded, one first does a whole-table scan to remove dead entries. If more than half of the entries in the table were dead, don't bother expanding it. Such an approach should yield amortized O(1) behavior, since one wouldn't do the whole-table scan until one had added back as many entries as had been deleted.

A simpler approach, which would also yield O(1) amortized time and O(1) space per recently-live element would be to keep a count of how many items were alive after the last time the table was purged, and how many elements have been added since then. Whenever the latter count exceeds the first, do a whole-table scan-and-purge. The time required for a scan and purge will be proportional to the number of elements added between purges, thus retaining amortized O(1) time, and the number of total elements in the collection will not exceed twice the number of elements that were recently observed to be alive, so the number of dead elements cannot exceed twice the number of recently-live elements.

I had this same problem, and solved it like this (WeakDictionary is the class I was trying to clean up):

internal class CleanerRef
{
    ~CleanerRef()
    {
        if (handle.IsAllocated)
            handle.Free();
    }

    public CleanerRef(WeakDictionaryCleaner cleaner, WeakDictionary dictionary)
    {
        handle = GCHandle.Alloc(cleaner, GCHandleType.WeakTrackResurrection);
        Dictionary = dictionary;
    }

    public bool IsAlive
    {
        get {return handle.IsAllocated && handle.Target != null;}
    }

    public object Target
    {
        get {return IsAlive ? handle.Target : null;}
    }

    GCHandle handle;
    public WeakDictionary Dictionary;
}


internal class WeakDictionaryCleaner
{
    public WeakDictionaryCleaner(WeakDictionary dict)
    {
        refs.Add(new CleanerRef(this, dict));
    }

    ~WeakDictionaryCleaner()
    {
        foreach(var cleanerRef in refs)
        {
            if (cleanerRef.Target == this)
            {
                cleanerRef.Dictionary.ClearGcedEntries();
                refs.Remove(cleanerRef);
                break;
            }
        }
    }
    private static readonly List<CleanerRef> refs = new List<CleanerRef>();
}

What this two classes try to achieve is to "hook" the GC. You activate this mechanism by creating an instance of WeakDictionaryCleaner during the construction of the weak collection:

new WeakDictionaryCleaner(weakDictionary);

Notice that I don't create any reference to the new instance, so that the GC will dispose it during the next cycle. In the ClearGcedEntries() method I create a new instance again, so that each GC cycle will have a cleaner to finalize that in turn will execute the collection compaction. You can make the CleanerRef.Dictionary also a weak reference so that it won't hold the dictionary in memory.

Hope this helps

I guess this is a right place to put it, even though it might look like necromancy. Just in case someone stumbles upon this question like I did. Lack of a dedicated Identity Map in .net is somewhat surprising, and I feel the most natural way for it work is as described in the last option: when the table is full and about to double its capacity, it checks to see if there is enough dead entries that can be recycled for further use so that growing is not necessary.

static IdentityMap<int, Entity> Cache = new IdentityMap<int, Entity>(e => e.ID);
...
var entity = Cache.Get(id, () => LoadEntity(id));

The class exposes just one public method Get with key and optional value parameter that lazily loads and caches an entity if it is not in the cache.

using System;
class IdentityMap<TKey, TValue>
    where TKey : IEquatable<TKey>
    where TValue : class
{
    Func<TValue, TKey> key_selector;
    WeakReference<TValue>[] references;
    int[] buckets;
    int[] bucket_indexes;
    int tail_index;
    int entries_count;
    int capacity;

    public IdentityMap(Func<TValue, TKey> key_selector, int capacity = 10) {
        this.key_selector = key_selector;
        Init(capacity);
    }
    void Init(int capacity) {
        this.bucket_indexes = new int[capacity];
        this.buckets = new int[capacity];
        this.references = new WeakReference<TValue>[capacity];
        for (int i = 0; i < capacity; i++) {
            bucket_indexes[i] = -1;
            buckets[i] = i - 1;
        }
        this.tail_index = capacity - 1;
        this.entries_count = 0;
        this.capacity = capacity;
    }

    public TValue Get(TKey key, Func<TValue> value = null) {
        int bucket_index = Math.Abs(key.GetHashCode() % this.capacity);
        var ret = WalkBucket(bucket_index, true, key);
        if (ret == null && value != null) Add(bucket_index, ret = value());
        return ret;
    }

    void Add(int bucket_index, TValue value) {
        if (this.entries_count == this.capacity) {
            for (int i = 0; i < capacity; i++) WalkBucket(i, false, default(TKey));
            if (this.entries_count * 2 > this.capacity) {
                var old_references = references;
                Init(this.capacity * 2);
                foreach (var old_reference in old_references) {
                    TValue old_value;
                    if (old_reference.TryGetTarget(out old_value)) {
                        int hash = key_selector(value).GetHashCode();
                        Add(Math.Abs(hash % this.capacity), old_value);
                    }
                }
            }
        }
        int new_index = this.tail_index;
        this.tail_index = buckets[this.tail_index];
        this.entries_count += 1;
        buckets[new_index] = bucket_indexes[bucket_index];
        if (references[new_index] != null) references[new_index].SetTarget(value);
        else references[new_index] = new WeakReference<TValue>(value);
        bucket_indexes[bucket_index] = new_index;
    }

    TValue WalkBucket(int bucket_index, bool is_searching, TKey key) {
        int curr_index = bucket_indexes[bucket_index];
        int prev_index = -1;
        while (curr_index != -1) {
            TValue value;
            int next_index = buckets[curr_index];
            if (references[curr_index].TryGetTarget(out value)) {
                if (is_searching && key_selector(value).Equals(key)) return value;
                prev_index = curr_index;
            } else {
                if (prev_index != -1) buckets[prev_index] = next_index;
                else bucket_indexes[bucket_index] = next_index;

                buckets[curr_index] = this.tail_index;
                this.tail_index = curr_index;
                this.entries_count -= 1;
            }
            curr_index = next_index;
        }
        return null;
    }
}

You could remove the "invalid" WeakReference inside TryGetValue:

[Edit] My mistake, these solutions actually do nothing more than what you suggested, since Put method will swap the old object with the new one anyway. Just ignore it.

public bool TryGetValue(TKey key, out TValue value) {
    WeakReference weakRef;
    if (!this.items.TryGetValue(key, out weakRef)) {
        value = null;
        return false;
    } else {
        value = (TValue)weakRef.Target;
        if (value == null)
            this.items.Remove(key);
        return (value != null);
    }
}

Or, you can immediatelly create a new instance inside your dictionary, whenever it is needed:

public TValue GetOrCreate(TKey key, Func<Tkey, TValue> ctor) {

    WeakReference weakRef;
    if (!this.items.TryGetValue(key, out weakRef) {
        Tvalue result = ctor(key);
        this.Put(key, result);
        return result;
    } 

    value = (TValue)weakRef.Target;
    if (value == null)
    {
        Tvalue result = ctor(key);
        this.Put(key, result);
        return result;
    }

    return value;
}

You would then use it like this:

static Foo CreateFoo(int id)
{
    return cache.GetOrCreate(id, id => new Foo(id));
}

[Edit]

According to windbg, WeakReference instance alone occupies 16 bytes. For 100,000 collected objects, this would not be such a serious burden, so you could easily let them live.

If this is a server app and you believe you could benefit from collecting, I would consider going for a background thread, but also implementing a simple algorithm to increase waiting time whenever you collect a relatively small number of objects.

A little specialization: When target classes know the weak dictionary reference and its TKey value, you can remove its entry from finalyzer call.

public class Entry<TKey>
{
    TKey key;
    Dictionary<TKey, WeakReference> weakDictionary;

    public Entry(Dictionary<TKey, WeakReference> weakDictionary, TKey key)
    {
        this.key = key;
        this.weakDictionary = weakDictionary;
    }

    ~Entry()
    {
        weakDictionary.Remove(key);
    }
}

When cached objects are subclass of Entry<TKey>, no empty WeakReference leaks since finalyzer is called after its instance was garbage collected.

来源：https://stackoverflow.com/questions/2047591/compacting-a-weakreference-dictionary

标签

.net-4.0

weak-references