问题
I have 9 GB of data, and I want only 10 rows. When I do:
data.OrderBy(datum => datum.Column1)
.Take(10)
.ToArray();
I get an OutOfMemoryException
. I would like to use an OrderByAndTake
method, optimized for lower memory consumption. It's easy to write, but I guess someone already did. Where can I find it.
Edit: It's Linq-to-objects. The data comes from a file. Each row can be discarded if its value for Column1
is smaller than the current list of 10 biggest values.
回答1:
I'm assuming you're doing this in Linq to Objects. You could do something like...
var best = data
.Aggregate(new List<T>(), (soFar, current) => soFar
.Concat(new [] { current })
.OrderBy(datum => datum.Column1)
.Take(10)
.ToList());
In this way, not all the items need to be kept in a new sorted collection, only the best 10 you're interested in.
This was the least code way. Since you know the soFar
list is sorted, testing where/if to insert current
could be optimized. I didn't feel like doing ALL the work for you. ;-)
PS: Replace T
with whatever your type is.
EDIT: Thinking about it, the most efficient way would actually be a plain old foreach
that compares each item to the running list of best 10.
回答2:
It figures: OrderBy is a Sort and that requires storing all the elements (deferred execution is cancelled).
It ought to work efficiently when data
is an IQueryable, then it's up to the database.
// just 4 fun
public static IEnumerable<T> TakeDistinctMin<T, TKey>(this IEnumerable<T> @this,
int n, Func<T, TKey> selector)
where TKey: IComparable<TKey>
{
var tops = new SortedList<TKey, T>(n+1);
foreach (var item in @this)
{
TKey k = selector(item);
if (tops.ContainsKey(k))
continue;
if (tops.Count < n)
{
tops.Add(k, item);
}
else if (k.CompareTo(tops.Keys[tops.Count - 1]) < 0)
{
tops.Add(k, item);
tops.RemoveAt(n);
}
}
return tops.Values;
}
回答3:
To order a set of unordered objects you have to look at all of them, no?
I don't see how you'd be able to avoid parsing all 9 GB of data to get the first 10 ordered in a certain way unless the 9 GB of data was already ordered in that fashion or if there were indexes or other ancillary data structures that could be utilized.
Could you provide a bit more background on your question. Are you querying a database using LINQ to SQL or Entity Framework or some other O/RM?
回答4:
You can use something like this together with a projection comparer:
public static IEnumerable<T> OrderAndTake<T>(this IEnumerable<T> seq,int count,IComparer<T> comp)
{
var resultSet=new SortedSet<T>(comp);
foreach(T elem in seq)
{
resultSet.Add(elem);
if(resultSet.Count>count)
resultSet.Remove(resultSet.Max);
}
return resultSet.Select(x=>x);
}
Runtime should be O(log(count)*seq.Count())
and space O(min(log(count),seq.Count()))
One issue is that it will break if you have two elements for which comp.Compare(a,b)==0
since the set doesn't allow duplicate entries.
来源:https://stackoverflow.com/questions/6076316/memory-optimized-orderby-and-take