问题
I am developing a c# program which has an "IEnumerable users" that stores the ids of 4 million users. I need to loop through the Ienummerable and extract a batch 1000 ids each time to perform some operations in another method.
How do I extract 1000 ids at a time from start of the Ienumerable ...do some thing else then fetch the next batch of 1000 and so on ?
Is this possible?
回答1:
Sounds like you need to use Skip and Take methods of your object. Example:
users.Skip(1000).Take(1000)
this would skip the first 1000 and take the next 1000. You'd just need to increase the amount skipped with each call
You could use an integer variable with the parameter for Skip and you can adjust how much is skipped. You can then call it in a method.
public IEnumerable<user> GetBatch(int pageNumber)
{
return users.Skip(pageNumber * 1000).Take(1000);
}
回答2:
You can use MoreLINQ's Batch operator (available from NuGet):
foreach(IEnumerable<User> batch in users.Batch(1000))
// use batch
If simple usage of library is not an option, you can reuse implementation:
public static IEnumerable<IEnumerable<T>> Batch<T>(
this IEnumerable<T> source, int size)
{
T[] bucket = null;
var count = 0;
foreach (var item in source)
{
if (bucket == null)
bucket = new T[size];
bucket[count++] = item;
if (count != size)
continue;
yield return bucket.Select(x => x);
bucket = null;
count = 0;
}
// Return the last bucket with all remaining elements
if (bucket != null && count > 0)
{
Array.Resize(ref bucket, count);
yield return bucket.Select(x => x);
}
}
BTW for performance you can simply return bucket without calling Select(x => x)
. Select is optimized for arrays, but selector delegate still would be invoked on each item. So, in your case it's better to use
yield return bucket;
回答3:
The easiest way to do this is probably just to use the GroupBy method in LINQ:
var batches = myEnumerable
.Select((x, i) => new { x, i })
.GroupBy(p => (p.i / 1000), (p, i) => p.x);
But for a more sophisticated solution, see this blog post on how to create your own extension method to do this. Duplicated here for posterity:
public static IEnumerable<IEnumerable<T>> Batch<T>(this IEnumerable<T> collection, int batchSize)
{
List<T> nextbatch = new List<T>(batchSize);
foreach (T item in collection)
{
nextbatch.Add(item);
if (nextbatch.Count == batchSize)
{
yield return nextbatch;
nextbatch = new List<T>();
// or nextbatch.Clear(); but see Servy's comment below
}
}
if (nextbatch.Count > 0)
yield return nextbatch;
}
回答4:
try using this:
public static IEnumerable<IEnumerable<TSource>> Batch<TSource>(
this IEnumerable<TSource> source,
int batchSize)
{
var batch = new List<TSource>();
foreach (var item in source)
{
batch.Add(item);
if (batch.Count == batchSize)
{
yield return batch;
batch = new List<TSource>();
}
}
if (batch.Any()) yield return batch;
}
and to use above function:
foreach (var list in Users.Batch(1000))
{
}
回答5:
You can achieve that using Take and Skip Enumerable extension method. For more information on usage checkout linq 101
回答6:
Something like this would work:
List<MyClass> batch = new List<MyClass>();
foreach (MyClass item in items)
{
batch.Add(item);
if (batch.Count == 1000)
{
// Perform operation on batch
batch.Clear();
}
}
// Process last batch
if (batch.Any())
{
// Perform operation on batch
}
And you could generalize this into a generic method, like this:
static void PerformBatchedOperation<T>(IEnumerable<T> items,
Action<IEnumerable<T>> operation,
int batchSize)
{
List<T> batch = new List<T>();
foreach (T item in items)
{
batch.Add(item);
if (batch.Count == batchSize)
{
operation(batch);
batch.Clear();
}
}
// Process last batch
if (batch.Any())
{
operation(batch);
}
}
回答7:
How about
int batchsize = 5;
List<string> colection = new List<string> { "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"};
for (int x = 0; x < Math.Ceiling((decimal)colection.Count / batchsize); x++)
{
var t = colection.Skip(x * batchsize).Take(batchsize);
}
回答8:
You can use Take operator linq
Link : http://msdn.microsoft.com/fr-fr/library/vstudio/bb503062.aspx
回答9:
In a streaming context, where the enumerator might get blocked in the middle of the batch, simply because the value is not yet produced (yield) it is useful to have a timeout method so that the last batch is produced after a given time. I used this for example for tailing a cursor in MongoDB. It's a little bit complicated, because the enumeration has to be done in another thread.
public static IEnumerable<List<T>> TimedBatch<T>(this IEnumerable<T> collection, double timeoutMilliseconds, long maxItems)
{
object _lock = new object();
List<T> batch = new List<T>();
AutoResetEvent yieldEventTriggered = new AutoResetEvent(false);
AutoResetEvent yieldEventFinished = new AutoResetEvent(false);
bool yieldEventTriggering = false;
var task = Task.Run(delegate
{
foreach (T item in collection)
{
lock (_lock)
{
batch.Add(item);
if (batch.Count == maxItems)
{
yieldEventTriggering = true;
yieldEventTriggered.Set();
}
}
if (yieldEventTriggering)
{
yieldEventFinished.WaitOne(); //wait for the yield to finish, and batch to be cleaned
yieldEventTriggering = false;
}
}
});
while (!task.IsCompleted)
{
//Wait for the event to be triggered, or the timeout to finish
yieldEventTriggered.WaitOne(TimeSpan.FromMilliseconds(timeoutMilliseconds));
lock (_lock)
{
if (batch.Count > 0) //yield return only if the batch accumulated something
{
yield return batch;
batch.Clear();
yieldEventFinished.Set();
}
}
}
task.Wait();
}
来源:https://stackoverflow.com/questions/15414347/how-to-loop-through-ienumerable-in-batches