Is there an IEnumerable implementation that only iterates over it's source (e.g. LINQ) once

后端 未结 3 501
我在风中等你
我在风中等你 2020-12-08 08:17

Provided items is the result of q LINQ expression:

var items = from item in ItemsSource.RetrieveItems()
            where ...

相关标签:
3条回答
  • 2020-12-08 08:29
    public static IEnumerable<T> SingleEnumeration<T>(this IEnumerable<T> source)
    {
        return new SingleEnumerator<T>(source);
    }
    
    private class SingleEnumerator<T> : IEnumerable<T>
    {
        private CacheEntry<T> cacheEntry;
        public SingleEnumerator(IEnumerable<T> sequence)
        {
            cacheEntry = new CacheEntry<T>(sequence.GetEnumerator());
        }
    
        public IEnumerator<T> GetEnumerator()
        {
            if (cacheEntry.FullyPopulated)
            {
                return cacheEntry.CachedValues.GetEnumerator();
            }
            else
            {
                return iterateSequence<T>(cacheEntry).GetEnumerator();
            }
        }
    
        IEnumerator IEnumerable.GetEnumerator()
        {
            return this.GetEnumerator();
        }
    }
    
    private static IEnumerable<T> iterateSequence<T>(CacheEntry<T> entry)
    {
        using (var iterator = entry.CachedValues.GetEnumerator())
        {
            int i = 0;
            while (entry.ensureItemAt(i) && iterator.MoveNext())
            {
                yield return iterator.Current;
                i++;
            }
        }
    }
    
    private class CacheEntry<T>
    {
        public bool FullyPopulated { get; private set; }
        public ConcurrentQueue<T> CachedValues { get; private set; }
    
        private static object key = new object();
        private IEnumerator<T> sequence;
    
        public CacheEntry(IEnumerator<T> sequence)
        {
            this.sequence = sequence;
            CachedValues = new ConcurrentQueue<T>();
        }
    
        /// <summary>
        /// Ensure that the cache has an item a the provided index.  If not, take an item from the 
        /// input sequence and move to the cache.
        /// 
        /// The method is thread safe.
        /// </summary>
        /// <returns>True if the cache already had enough items or 
        /// an item was moved to the cache, 
        /// false if there were no more items in the sequence.</returns>
        public bool ensureItemAt(int index)
        {
            //if the cache already has the items we don't need to lock to know we 
            //can get it
            if (index < CachedValues.Count)
                return true;
            //if we're done there's no race conditions hwere either
            if (FullyPopulated)
                return false;
    
            lock (key)
            {
                //re-check the early-exit conditions in case they changed while we were
                //waiting on the lock.
    
                //we already have the cached item
                if (index < CachedValues.Count)
                    return true;
                //we don't have the cached item and there are no uncached items
                if (FullyPopulated)
                    return false;
    
                //we actually need to get the next item from the sequence.
                if (sequence.MoveNext())
                {
                    CachedValues.Enqueue(sequence.Current);
                    return true;
                }
                else
                {
                    FullyPopulated = true;
                    return false;
                }
            }
        }
    }
    

    So this has been edited (substantially) to support multithreaded access. Several threads can ask for items, and on an item by item basis, they will be cached. It doesn't need to wait for the entire sequence to be iterated for it to return cached values. Below is a sample program that demonstrates this:

    private static IEnumerable<int> interestingIntGenertionMethod(int maxValue)
    {
        for (int i = 0; i < maxValue; i++)
        {
            Thread.Sleep(1000);
            Console.WriteLine("actually generating value: {0}", i);
            yield return i;
        }
    }
    
    public static void Main(string[] args)
    {
        IEnumerable<int> sequence = interestingIntGenertionMethod(10)
            .SingleEnumeration();
    
        int numThreads = 3;
        for (int i = 0; i < numThreads; i++)
        {
            int taskID = i;
            Task.Factory.StartNew(() =>
            {
                foreach (int value in sequence)
                {
                    Console.WriteLine("Task: {0} Value:{1}",
                        taskID, value);
                }
            });
        }
    
        Console.WriteLine("Press any key to exit...");
        Console.ReadKey(true);
    }
    

    You really need to see it run to understand the power here. As soon as a single thread forces the next actual values to be generated all of the remaining threads can immediately print that generated value, but they will all be waiting if there are no uncached values for that thread to print. (Obviously thread/threadpool scheduling may result in one task taking longer to print it's value than needed.)

    0 讨论(0)
  • 2020-12-08 08:37

    Take a look at the Reactive Extentsions library - there is a MemoizeAll() extension which will cache the items in your IEnumerable once they're accessed, and store them for future accesses.

    See this blog post by Bart De Smet for a good read on MemoizeAll and other Rx methods.

    Edit: This is actually found in the separate Interactive Extensions package now - available from NuGet or Microsoft Download.

    0 讨论(0)
  • 2020-12-08 08:42

    A fun challenge so I have to provide my own solution. So fun in fact that my solution now is in version 3. Version 2 was a simplification I made based on feedback from Servy. I then realized that my solution had huge drawback. If the first enumeration of the cached enumerable didn't complete no caching would be done. Many LINQ extensions like First and Take will only enumerate enough of the enumerable to get the job done and I had to update to version 3 to make this work with caching.

    The question is about subsequent enumerations of the enumerable which does not involve concurrent access. Nevertheless I have decided to make my solution thread safe. It adds some complexity and a bit of overhead but should allow the solution to be used in all scenarios.

    public static class EnumerableExtensions {
    
      public static IEnumerable<T> Cached<T>(this IEnumerable<T> source) {
        if (source == null)
          throw new ArgumentNullException("source");
        return new CachedEnumerable<T>(source);
      }
    
    }
    
    class CachedEnumerable<T> : IEnumerable<T> {
    
      readonly Object gate = new Object();
    
      readonly IEnumerable<T> source;
    
      readonly List<T> cache = new List<T>();
    
      IEnumerator<T> enumerator;
    
      bool isCacheComplete;
    
      public CachedEnumerable(IEnumerable<T> source) {
        this.source = source;
      }
    
      public IEnumerator<T> GetEnumerator() {
        lock (this.gate) {
          if (this.isCacheComplete)
            return this.cache.GetEnumerator();
          if (this.enumerator == null)
            this.enumerator = source.GetEnumerator();
        }
        return GetCacheBuildingEnumerator();
      }
    
      public IEnumerator<T> GetCacheBuildingEnumerator() {
        var index = 0;
        T item;
        while (TryGetItem(index, out item)) {
          yield return item;
          index += 1;
        }
      }
    
      bool TryGetItem(Int32 index, out T item) {
        lock (this.gate) {
          if (!IsItemInCache(index)) {
            // The iteration may have completed while waiting for the lock.
            if (this.isCacheComplete) {
              item = default(T);
              return false;
            }
            if (!this.enumerator.MoveNext()) {
              item = default(T);
              this.isCacheComplete = true;
              this.enumerator.Dispose();
              return false;
            }
            this.cache.Add(this.enumerator.Current);
          }
          item = this.cache[index];
          return true;
        }
      }
    
      bool IsItemInCache(Int32 index) {
        return index < this.cache.Count;
      }
    
      IEnumerator IEnumerable.GetEnumerator() {
        return GetEnumerator();
      }
    
    }
    

    The extension is used like this (sequence is an IEnumerable<T>):

    var cachedSequence = sequence.Cached();
    
    // Pulling 2 items from the sequence.
    foreach (var item in cachedSequence.Take(2))
      // ...
    
    // Pulling 2 items from the cache and the rest from the source.
    foreach (var item in cachedSequence)
      // ...
    
    // Pulling all items from the cache.
    foreach (var item in cachedSequence)
      // ...
    

    There is slight leak if only part of the enumerable is enumerated (e.g. cachedSequence.Take(2).ToList(). The enumerator that is used by ToList will be disposed but the underlying source enumerator is not disposed. This is because the first 2 items are cached and the source enumerator is kept alive should requests for subsequent items be made. In that case the source enumerator is only cleaned up when eligigble for garbage Collection (which will be the same time as the possibly large cache).

    0 讨论(0)
提交回复
热议问题