问题
I want to take an IEnumerable<T>
and split it up into fixed-sized chunks.
I have this, but it seems inelegant due to all the list creation/copying:
private static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items, int partitionSize)
{
List<T> partition = new List<T>(partitionSize);
foreach (T item in items)
{
partition.Add(item);
if (partition.Count == partitionSize)
{
yield return partition;
partition = new List<T>(partitionSize);
}
}
// Cope with items.Count % partitionSize != 0
if (partition.Count > 0) yield return partition;
}
Is there something more idiomatic?
EDIT: Although this has been marked as a duplicate of Divide array into an array of subsequence array it is not - that question deals with splitting an array, whereas this is about IEnumerable<T>
. In addition that question requires that the last subsequence is padded. The two questions are closely related but aren't the same.
回答1:
You could try to implement Batch method mentioned above on your own like this:
static class MyLinqExtensions
{
public static IEnumerable<IEnumerable<T>> Batch<T>(
this IEnumerable<T> source, int batchSize)
{
using (var enumerator = source.GetEnumerator())
while (enumerator.MoveNext())
yield return YieldBatchElements(enumerator, batchSize - 1);
}
private static IEnumerable<T> YieldBatchElements<T>(
IEnumerator<T> source, int batchSize)
{
yield return source.Current;
for (int i = 0; i < batchSize && source.MoveNext(); i++)
yield return source.Current;
}
}
I've grabbed this code from http://blogs.msdn.com/b/pfxteam/archive/2012/11/16/plinq-and-int32-maxvalue.aspx.
UPDATE: Please note, that this implementation not only lazily evaluates batches but also items inside batches, which means it will only produce correct results when batch is enumerated only after all previous batches were enumerated. For example:
public static void Main(string[] args)
{
var xs = Enumerable.Range(1, 20);
Print(xs.Batch(5).Skip(1)); // should skip first batch with 5 elements
}
public static void Print<T>(IEnumerable<IEnumerable<T>> batches)
{
foreach (var batch in batches)
{
Console.WriteLine($"[{string.Join(", ", batch)}]");
}
}
will output:
[2, 3, 4, 5, 6] //only first element is skipped.
[7, 8, 9, 10, 11]
[12, 13, 14, 15, 16]
[17, 18, 19, 20]
So, if you use case assumes batching when batches are sequentially evaluated, then lazy solution above will work, otherwise if you can't guarantee strictly sequential batch processing (e.g. when you want to process batches in parallel), you will probably need a solution which eagerly enumerates batch content, similar to one mentioned in the question above or in the MoreLINQ
回答2:
Maybe?
public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items, int partitionSize)
{
return items.Select((item, inx) => new { item, inx })
.GroupBy(x => x.inx / partitionSize)
.Select(g => g.Select(x => x.item));
}
There is an already implemented one too: morelinq's Batch.
回答3:
It feels like you want two iterator blocks ("yield return
methods"). I wrote this extension method:
static class Extensions
{
public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items, int partitionSize)
{
return new PartitionHelper<T>(items, partitionSize);
}
private sealed class PartitionHelper<T> : IEnumerable<IEnumerable<T>>
{
readonly IEnumerable<T> items;
readonly int partitionSize;
bool hasMoreItems;
internal PartitionHelper(IEnumerable<T> i, int ps)
{
items = i;
partitionSize = ps;
}
public IEnumerator<IEnumerable<T>> GetEnumerator()
{
using (var enumerator = items.GetEnumerator())
{
hasMoreItems = enumerator.MoveNext();
while (hasMoreItems)
yield return GetNextBatch(enumerator).ToList();
}
}
IEnumerable<T> GetNextBatch(IEnumerator<T> enumerator)
{
for (int i = 0; i < partitionSize; ++i)
{
yield return enumerator.Current;
hasMoreItems = enumerator.MoveNext();
if (!hasMoreItems)
yield break;
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
}
回答4:
Craziest solution (with Reactive Extensions):
public static IEnumerable<IList<T>> Partition<T>(this IEnumerable<T> items, int partitionSize)
{
return items
.ToObservable() // Converting sequence to observable sequence
.Buffer(partitionSize) // Splitting it on spececified "partitions"
.ToEnumerable(); // Converting it back to ordinary sequence
}
I know that I changed signature but anyway we all know that we'll have some fixed size collection as a chunk.
BTW if you will use iterator block do not forget to split your implementation into two methods to validate arguments eagerly!
回答5:
For elegant solution, You can also have a look at MoreLinq.Batch.
It batches the source sequence into sized buckets.
Example:
int[] ints = new int[] {1,2,3,4,5,6};
var batches = ints.Batch(2); // batches -> [0] : 1,2 ; [1]:3,4 ; [2] :5,6
回答6:
public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> items,
int partitionSize)
{
int i = 0;
return items.GroupBy(x => i++ / partitionSize).ToArray();
}
回答7:
You can do this using an overload of Enumerable.GroupBy and taking advantage of integer division.
return items.Select((element, index) => new { Element = element, Index = index })
.GroupBy(obj => obj.Index / partitionSize, (_, partition) => partition);
回答8:
How about the partitioner classes in the System.Collections.Concurrent namespace?
来源:https://stackoverflow.com/questions/13709626/split-an-ienumerablet-into-fixed-sized-chunks-return-an-ienumerableienumerab