I have 9 GB of data, and I want only 10 rows. When I do:
data.OrderBy(datum => datum.Column1)
.Take(10)
.ToArray();
I get an OutOfMemoryException. I would like to use an OrderByAndTake method, optimized for lower memory consumption. It's easy to write, but I guess someone already did. Where can I find it.
Edit: It's Linq-to-objects. The data comes from a file. Each row can be discarded if its value for Column1 is smaller than the current list of 10 biggest values.
I'm assuming you're doing this in Linq to Objects. You could do something like...
var best = data
.Aggregate(new List<T>(), (soFar, current) => soFar
.Concat(new [] { current })
.OrderBy(datum => datum.Column1)
.Take(10)
.ToList());
In this way, not all the items need to be kept in a new sorted collection, only the best 10 you're interested in.
This was the least code way. Since you know the soFar list is sorted, testing where/if to insert current could be optimized. I didn't feel like doing ALL the work for you. ;-)
PS: Replace T with whatever your type is.
EDIT: Thinking about it, the most efficient way would actually be a plain old foreach that compares each item to the running list of best 10.
It figures: OrderBy is a Sort and that requires storing all the elements (deferred execution is cancelled).
It ought to work efficiently when data is an IQueryable, then it's up to the database.
// just 4 fun
public static IEnumerable<T> TakeDistinctMin<T, TKey>(this IEnumerable<T> @this,
int n, Func<T, TKey> selector)
where TKey: IComparable<TKey>
{
var tops = new SortedList<TKey, T>(n+1);
foreach (var item in @this)
{
TKey k = selector(item);
if (tops.ContainsKey(k))
continue;
if (tops.Count < n)
{
tops.Add(k, item);
}
else if (k.CompareTo(tops.Keys[tops.Count - 1]) < 0)
{
tops.Add(k, item);
tops.RemoveAt(n);
}
}
return tops.Values;
}
To order a set of unordered objects you have to look at all of them, no?
I don't see how you'd be able to avoid parsing all 9 GB of data to get the first 10 ordered in a certain way unless the 9 GB of data was already ordered in that fashion or if there were indexes or other ancillary data structures that could be utilized.
Could you provide a bit more background on your question. Are you querying a database using LINQ to SQL or Entity Framework or some other O/RM?
You can use something like this together with a projection comparer:
public static IEnumerable<T> OrderAndTake<T>(this IEnumerable<T> seq,int count,IComparer<T> comp)
{
var resultSet=new SortedSet<T>(comp);
foreach(T elem in seq)
{
resultSet.Add(elem);
if(resultSet.Count>count)
resultSet.Remove(resultSet.Max);
}
return resultSet.Select(x=>x);
}
Runtime should be O(log(count)*seq.Count()) and space O(min(log(count),seq.Count()))
One issue is that it will break if you have two elements for which comp.Compare(a,b)==0 since the set doesn't allow duplicate entries.
来源:https://stackoverflow.com/questions/6076316/memory-optimized-orderby-and-take