Memory optimized OrderBy and Take?

你。 提交于 2019-12-04 07:25:16

I'm assuming you're doing this in Linq to Objects. You could do something like...

var best = data
    .Aggregate(new List<T>(), (soFar, current) => soFar
                                                 .Concat(new [] { current })
                                                 .OrderBy(datum => datum.Column1)
                                                 .Take(10)
                                                 .ToList());

In this way, not all the items need to be kept in a new sorted collection, only the best 10 you're interested in.

This was the least code way. Since you know the soFar list is sorted, testing where/if to insert current could be optimized. I didn't feel like doing ALL the work for you. ;-)

PS: Replace T with whatever your type is.

EDIT: Thinking about it, the most efficient way would actually be a plain old foreach that compares each item to the running list of best 10.

It figures: OrderBy is a Sort and that requires storing all the elements (deferred execution is cancelled).

It ought to work efficiently when data is an IQueryable, then it's up to the database.


  // just 4 fun
  public static IEnumerable<T> TakeDistinctMin<T, TKey>(this IEnumerable<T> @this, 
        int n, Func<T, TKey> selector)            
         where TKey: IComparable<TKey>
  {
        var tops = new SortedList<TKey, T>(n+1);

        foreach (var item in @this)
        {
            TKey k = selector(item);

            if (tops.ContainsKey(k))
                continue;

            if (tops.Count < n)
            {
                tops.Add(k, item);
            }
            else if (k.CompareTo(tops.Keys[tops.Count - 1]) < 0)
            {
                tops.Add(k, item);
                tops.RemoveAt(n);
            }                                    
        }

        return tops.Values;
    }

To order a set of unordered objects you have to look at all of them, no?

I don't see how you'd be able to avoid parsing all 9 GB of data to get the first 10 ordered in a certain way unless the 9 GB of data was already ordered in that fashion or if there were indexes or other ancillary data structures that could be utilized.

Could you provide a bit more background on your question. Are you querying a database using LINQ to SQL or Entity Framework or some other O/RM?

You can use something like this together with a projection comparer:

public static IEnumerable<T> OrderAndTake<T>(this IEnumerable<T> seq,int count,IComparer<T> comp)
{
  var resultSet=new SortedSet<T>(comp);
  foreach(T elem in seq)
  {
    resultSet.Add(elem);
    if(resultSet.Count>count)
        resultSet.Remove(resultSet.Max);
  }
  return resultSet.Select(x=>x);
}

Runtime should be O(log(count)*seq.Count()) and space O(min(log(count),seq.Count()))

One issue is that it will break if you have two elements for which comp.Compare(a,b)==0 since the set doesn't allow duplicate entries.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!