Memory optimized OrderBy and Take?

走远了吗. 提交于 2019-12-21 17:12:14

问题


I have 9 GB of data, and I want only 10 rows. When I do:

 data.OrderBy(datum => datum.Column1)
     .Take(10)
     .ToArray();

I get an OutOfMemoryException. I would like to use an OrderByAndTake method, optimized for lower memory consumption. It's easy to write, but I guess someone already did. Where can I find it.

Edit: It's Linq-to-objects. The data comes from a file. Each row can be discarded if its value for Column1 is smaller than the current list of 10 biggest values.


回答1:


I'm assuming you're doing this in Linq to Objects. You could do something like...

var best = data
    .Aggregate(new List<T>(), (soFar, current) => soFar
                                                 .Concat(new [] { current })
                                                 .OrderBy(datum => datum.Column1)
                                                 .Take(10)
                                                 .ToList());

In this way, not all the items need to be kept in a new sorted collection, only the best 10 you're interested in.

This was the least code way. Since you know the soFar list is sorted, testing where/if to insert current could be optimized. I didn't feel like doing ALL the work for you. ;-)

PS: Replace T with whatever your type is.

EDIT: Thinking about it, the most efficient way would actually be a plain old foreach that compares each item to the running list of best 10.




回答2:


It figures: OrderBy is a Sort and that requires storing all the elements (deferred execution is cancelled).

It ought to work efficiently when data is an IQueryable, then it's up to the database.


  // just 4 fun
  public static IEnumerable<T> TakeDistinctMin<T, TKey>(this IEnumerable<T> @this, 
        int n, Func<T, TKey> selector)            
         where TKey: IComparable<TKey>
  {
        var tops = new SortedList<TKey, T>(n+1);

        foreach (var item in @this)
        {
            TKey k = selector(item);

            if (tops.ContainsKey(k))
                continue;

            if (tops.Count < n)
            {
                tops.Add(k, item);
            }
            else if (k.CompareTo(tops.Keys[tops.Count - 1]) < 0)
            {
                tops.Add(k, item);
                tops.RemoveAt(n);
            }                                    
        }

        return tops.Values;
    }



回答3:


To order a set of unordered objects you have to look at all of them, no?

I don't see how you'd be able to avoid parsing all 9 GB of data to get the first 10 ordered in a certain way unless the 9 GB of data was already ordered in that fashion or if there were indexes or other ancillary data structures that could be utilized.

Could you provide a bit more background on your question. Are you querying a database using LINQ to SQL or Entity Framework or some other O/RM?




回答4:


You can use something like this together with a projection comparer:

public static IEnumerable<T> OrderAndTake<T>(this IEnumerable<T> seq,int count,IComparer<T> comp)
{
  var resultSet=new SortedSet<T>(comp);
  foreach(T elem in seq)
  {
    resultSet.Add(elem);
    if(resultSet.Count>count)
        resultSet.Remove(resultSet.Max);
  }
  return resultSet.Select(x=>x);
}

Runtime should be O(log(count)*seq.Count()) and space O(min(log(count),seq.Count()))

One issue is that it will break if you have two elements for which comp.Compare(a,b)==0 since the set doesn't allow duplicate entries.



来源:https://stackoverflow.com/questions/6076316/memory-optimized-orderby-and-take

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!