Understanding lazy loading optimization in C#

问题

After reading a bit of how yield, foreach, linq deferred execution and iterators work in C#. I decided to give it a try optimizing an attribute based validation mechanic inside a small project. The result:

private IEnumerable<string> GetPropertyErrors(PropertyInfo property)
{
    // where Entity is the current object instance
    string propertyValue = property.GetValue(Entity)?.ToString();

    foreach (var attribute in property.GetCustomAttributes().OfType<ValidationAttribute>())
    {
        if (!attribute.IsValid(propertyValue))
        {
            yield return $"Error: {property.Name} {attribute.ErrorMessage}";
        }
    }
}

// inside another method
foreach(string error in GetPropertyErrors(property))
{
    // Some display/insert log operation
}

I find this slow but that also could be due to reflection or a large amount of properties to process.

So my question is... Is this optimal or a good use of the lazy loading mechanic? or I'm missing something and just wasting tons of resources.

NOTE: The code intention itself is not important, my concern is the use of lazy loading in it.

回答1:

Lazy loading is not something specific to C# or to Entity Framework. It's a common pattern, which allows defer some data loading. Deferring means not loading immediately. Some samples when you need that:

Loading images in (Word) document. Document may be big and it can contain thousands of images. If you'll load all them when document is opened it might take big amount of time. Nobody wants sit and watch 30 seconds on loading document. Same approach is used in web browsers - resources are not sent with body of page. Browser defers resources loading.
Loading graphs of objects. It may be objects from database, file system objects etc. Loading full graph might be equal to loading all database content into memory. How long it will take? Is it efficient? No. If you are building some file system explorer will you load info about every file in system before you start using it? It's much faster if you will load info about current directory only (and probably it's direct children).

Lazy loading not always mean deferring loading until you really need data. Loading might occur in background thread before you really need that data. E.g. you might never scroll to the bottom of web page to see footer image. Lazy loading means only deferring. And C# enumerators can help you with that. Consider getting list of files in directory:

string[] files = Directory.GetFiles("D:");
IEnumerable<string> filesEnumerator = Directory.EnumerateFiles("D:");

First approach returns array of files. It means directory should get all its files and save their names to array before you can get even first file name. It's like loading all images before you see document.

Second approach uses enumerator - it returns files one by one when you ask for next file name. It means that enumerator is returned immediately without getting all files and saving them to some collection. And you can process files one by one when you need that. Here getting files list is deferred.

But you should be careful. If underlying operation is not deferred, then returning enumerator gives you no benefits. E.g.

public IEnumerable<string> EnumerateFiles(string path)
{
    foreach(string file in Directory.GetFiles(path))
        yield return file;
}

Here you use GetFiles method which fills array of file names before returning them. So yielding files one by one gives you no speed benefits.

Btw in your case you have exactly same problem - GetCustomAttributes extension internally uses Attribute.GetCustomAttributes method which returns array of attributes. So you will not reduce time of getting first result.

回答2:

This isn't quite how the term "lazy loading" is generally used in .NET. "Lazy loading" is most often used of something like:

public SomeType SomeValue
{
  get
  {
    if (_backingField == null)
      _backingField = RelativelyLengthyCalculationOrRetrieval();
    return _backingField;
  }
}

As opposed to just having _backingField set when an instance was constructed. Its advantage is that it costs nothing in the cases when SomeValue is never accessed, at the expense of a slightly greater cost when it is. It's therefore advantageous when the chances of SomeValue not being called are relatively high, and generally disadvantageous otherwise with some exceptions (when we might care about how quickly things are done in between instance creation and the first call to SomeValue).

Here we have deferred execution. It's similar, but not quite the same. When you call GetPropertyErrors(property) rather than receiving a collection of all of the errors you receive an object that can find those errors when asked for them.

It will always save the time taken to get the first such item, because it allows you to act upon it immediately rather than waiting until it has finished processing.

It will always reduce memory use, because it isn't spending memory on a collection.

It will also save time in total, because no time is spent creating a collection.

However, if you need to access it more than once, then while a collection will still have the same results, it will have to calculate them all again (unlike lazy loading which loads its results and stores them for subsequent reuse).

If you're rarely going to want to hit the same set of results, it's generally always a win.

If you're almost always going to want to hit the same set of results, it's generally a lose.

If you are sometimes going to want to hit the same set of results though, you can pass the decision on whether to cache or not up to the caller, with a single use calling GetPropertyErrors() and acting on the results directly, but a repeated use calling ToList() on that and then acting repeatedly on that list.

As such, the approach of not sending a list is the more flexible, allowing the calling code to decide which approach is the more efficient for its particular use of it.

You could also combine it with lazy loading:

private IEnumerable<string> LazyLoadedEnumerator()
{
  if (_store == null)
    return StoringCalculatingEnumerator();
  return _store;
}

private IEnumerable<string> StoringCalculatingEnumerator()
{
  List<string> store = new List<string>();
  foreach(string str in SomethingThatCalculatesTheseStrings())
  {
    yield return str;
    store.Add(str);
  }
  _store = store;
}

This combination is rarely useful in practice though.

As a rule, start with deferred evaluation as the normal approach and decide further up the call chain whether to store the results or not. An exception though is if you can know the size of the results before you begin (you can't here because you don't know if an element will be added or not until you've examined the property). In this case there is the possibility of a performance improvement in just how you create that list, because you can set its capacity ahead of time. This though is a micro-optimisation that is only applicable if you also know that you'll also always want to work on a list and doesn't save that much in the grand scheme of things.

来源：https://stackoverflow.com/questions/35153049/understanding-lazy-loading-optimization-in-c-sharp

标签

linq

lazy-loading

deferred-execution