Split List into Sublists with LINQ

前端 未结 30 2921
灰色年华
灰色年华 2020-11-21 06:26

Is there any way I can separate a List into several separate lists of SomeObject, using the item index as the delimiter of each s

30条回答
  •  轮回少年
    2020-11-21 06:58

    In general the approach suggested by CaseyB works fine, in fact if you are passing in a List it is hard to fault it, perhaps I would change it to:

    public static IEnumerable> ChunkTrivialBetter(this IEnumerable source, int chunksize)
    {
       var pos = 0; 
       while (source.Skip(pos).Any())
       {
          yield return source.Skip(pos).Take(chunksize);
          pos += chunksize;
       }
    }
    

    Which will avoid massive call chains. Nonetheless, this approach has a general flaw. It materializes two enumerations per chunk, to highlight the issue try running:

    foreach (var item in Enumerable.Range(1, int.MaxValue).Chunk(8).Skip(100000).First())
    {
       Console.WriteLine(item);
    }
    // wait forever 
    

    To overcome this we can try Cameron's approach, which passes the above test in flying colors as it only walks the enumeration once.

    Trouble is that it has a different flaw, it materializes every item in each chunk, the trouble with that approach is that you run high on memory.

    To illustrate that try running:

    foreach (var item in Enumerable.Range(1, int.MaxValue)
                   .Select(x => x + new string('x', 100000))
                   .Clump(10000).Skip(100).First())
    {
       Console.Write('.');
    }
    // OutOfMemoryException
    

    Finally, any implementation should be able to handle out of order iteration of chunks, for example:

    Enumerable.Range(1,3).Chunk(2).Reverse().ToArray()
    // should return [3],[1,2]
    

    Many highly optimal solutions like my first revision of this answer failed there. The same issue can be seen in casperOne's optimized answer.

    To address all these issues you can use the following:

    namespace ChunkedEnumerator
    {
        public static class Extensions 
        {
            class ChunkedEnumerable : IEnumerable
            {
                class ChildEnumerator : IEnumerator
                {
                    ChunkedEnumerable parent;
                    int position;
                    bool done = false;
                    T current;
    
    
                    public ChildEnumerator(ChunkedEnumerable parent)
                    {
                        this.parent = parent;
                        position = -1;
                        parent.wrapper.AddRef();
                    }
    
                    public T Current
                    {
                        get
                        {
                            if (position == -1 || done)
                            {
                                throw new InvalidOperationException();
                            }
                            return current;
    
                        }
                    }
    
                    public void Dispose()
                    {
                        if (!done)
                        {
                            done = true;
                            parent.wrapper.RemoveRef();
                        }
                    }
    
                    object System.Collections.IEnumerator.Current
                    {
                        get { return Current; }
                    }
    
                    public bool MoveNext()
                    {
                        position++;
    
                        if (position + 1 > parent.chunkSize)
                        {
                            done = true;
                        }
    
                        if (!done)
                        {
                            done = !parent.wrapper.Get(position + parent.start, out current);
                        }
    
                        return !done;
    
                    }
    
                    public void Reset()
                    {
                        // per http://msdn.microsoft.com/en-us/library/system.collections.ienumerator.reset.aspx
                        throw new NotSupportedException();
                    }
                }
    
                EnumeratorWrapper wrapper;
                int chunkSize;
                int start;
    
                public ChunkedEnumerable(EnumeratorWrapper wrapper, int chunkSize, int start)
                {
                    this.wrapper = wrapper;
                    this.chunkSize = chunkSize;
                    this.start = start;
                }
    
                public IEnumerator GetEnumerator()
                {
                    return new ChildEnumerator(this);
                }
    
                System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
                {
                    return GetEnumerator();
                }
    
            }
    
            class EnumeratorWrapper
            {
                public EnumeratorWrapper (IEnumerable source)
                {
                    SourceEumerable = source;
                }
                IEnumerable SourceEumerable {get; set;}
    
                Enumeration currentEnumeration;
    
                class Enumeration
                {
                    public IEnumerator Source { get; set; }
                    public int Position { get; set; }
                    public bool AtEnd { get; set; }
                }
    
                public bool Get(int pos, out T item) 
                {
    
                    if (currentEnumeration != null && currentEnumeration.Position > pos)
                    {
                        currentEnumeration.Source.Dispose();
                        currentEnumeration = null;
                    }
    
                    if (currentEnumeration == null)
                    {
                        currentEnumeration = new Enumeration { Position = -1, Source = SourceEumerable.GetEnumerator(), AtEnd = false };
                    }
    
                    item = default(T);
                    if (currentEnumeration.AtEnd)
                    {
                        return false;
                    }
    
                    while(currentEnumeration.Position < pos) 
                    {
                        currentEnumeration.AtEnd = !currentEnumeration.Source.MoveNext();
                        currentEnumeration.Position++;
    
                        if (currentEnumeration.AtEnd) 
                        {
                            return false;
                        }
    
                    }
    
                    item = currentEnumeration.Source.Current;
    
                    return true;
                }
    
                int refs = 0;
    
                // needed for dispose semantics 
                public void AddRef()
                {
                    refs++;
                }
    
                public void RemoveRef()
                {
                    refs--;
                    if (refs == 0 && currentEnumeration != null)
                    {
                        var copy = currentEnumeration;
                        currentEnumeration = null;
                        copy.Source.Dispose();
                    }
                }
            }
    
            public static IEnumerable> Chunk(this IEnumerable source, int chunksize)
            {
                if (chunksize < 1) throw new InvalidOperationException();
    
                var wrapper =  new EnumeratorWrapper(source);
    
                int currentPos = 0;
                T ignore;
                try
                {
                    wrapper.AddRef();
                    while (wrapper.Get(currentPos, out ignore))
                    {
                        yield return new ChunkedEnumerable(wrapper, chunksize, currentPos);
                        currentPos += chunksize;
                    }
                }
                finally
                {
                    wrapper.RemoveRef();
                }
            }
        }
    
        class Program
        {
            static void Main(string[] args)
            {
                int i = 10;
                foreach (var group in Enumerable.Range(1, int.MaxValue).Skip(10000000).Chunk(3))
                {
                    foreach (var n in group)
                    {
                        Console.Write(n);
                        Console.Write(" ");
                    }
                    Console.WriteLine();
                    if (i-- == 0) break;
                }
    
    
                var stuffs = Enumerable.Range(1, 10).Chunk(2).ToArray();
    
                foreach (var idx in new [] {3,2,1})
                {
                    Console.Write("idx " + idx + " ");
                    foreach (var n in stuffs[idx])
                    {
                        Console.Write(n);
                        Console.Write(" ");
                    }
                    Console.WriteLine();
                }
    
                /*
    
    10000001 10000002 10000003
    10000004 10000005 10000006
    10000007 10000008 10000009
    10000010 10000011 10000012
    10000013 10000014 10000015
    10000016 10000017 10000018
    10000019 10000020 10000021
    10000022 10000023 10000024
    10000025 10000026 10000027
    10000028 10000029 10000030
    10000031 10000032 10000033
    idx 3 7 8
    idx 2 5 6
    idx 1 3 4
                 */
    
                Console.ReadKey();
    
    
            }
    
        }
    }
    

    There is also a round of optimisations you could introduce for out-of-order iteration of chunks, which is out of scope here.

    As to which method you should choose? It totally depends on the problem you are trying to solve. If you are not concerned with the first flaw the simple answer is incredibly appealing.

    Note as with most methods, this is not safe for multi threading, stuff can get weird if you wish to make it thread safe you would need to amend EnumeratorWrapper.

提交回复
热议问题