c# generic, covering both arrays and lists?

前端 未结 6 1453
春和景丽
春和景丽 2021-01-04 01:10

Here\'s a very handy extension, which works for an array of anything:

public static T AnyOne(this T[] ra) where T:class
{
    int k = ra         


        
6条回答
  •  萌比男神i
    2021-01-04 01:59

    It's interesting how some people choose IEnumerable, while some other people insist on IReadOnlyList.

    Now let's be honest. IEnumerable is useful, very useful. In most cases you just want to put this method in some library, and throw your utility function to whatever you think is a collection, and be done with it. However, using IEnumerable correctly is a bit tricky, as I'll point out here...

    IEnumerable

    Let's for a second assume that the OP is using Linq and wants to get a random element from a sequence. Basically he ends up with the code from @Yannick, that ends up in the library of utility helper functions:

    public static T AnyOne(this IEnumerable source)
    {
        int endExclusive = source.Count(); // #1
        int randomIndex = Random.Range(0, endExclusive); 
        return source.ElementAt(randomIndex); // #2
    }
    

    Now, what this basically does is 2 things:

    1. Count the number of elements in the source. If the source is a simple IEnumerable this implies going through all the elements in the list, if it's f.ex. a List, it will use the Count property.
    2. Reset the enumerable, go to element randomIndex, grab it and return it.

    There are two things that can go wrong here. First of all, your IEnumerable might be a slow, sequential storage, and doing Count can ruin the performance of your application in an unexpected way. For example, streaming from a device might get you into trouble. That said, you could very well argue that's to be expected when that's inherent to the characteristic of the collection - and personally I'd say that argument will hold.

    Secondly -and this is perhaps even more important- there's no guarantee that you enumerable will return the same sequence every iteration (and therefore there's also no guarantee that your code won't crash). For example, consider this innocent looking piece of code, that might be useful for testing purposes:

    IEnumerable GenerateRandomDataset()
    {
        Random rnd = new Random();
        int count = rnd.Next(10, 100); // randomize number of elements
        for (int i=0; i

    The first iteration (calling Count()), you might generate 99 results. You pick element 98. Next you call ElementAt, the second iteration generates 12 results and your application crashes. Not cool.

    Fixing the IEnumerable implementation

    As we've seen, the issue of the IEnumerable implementation is that you have to go through the data 2 times. We can fix that by going through the data a single time.

    The 'trick' here is actually pretty simple: if we have seen 1 element, we definitely want to consider returning that. All elements considered, there's a 50%/50% chance that this is the element we would have returned. If we see the third element, there's a 33%/33%/33% chance that we would have returned this. And so on.

    Therefore, a better implementation might be this one:

    public static T AnyOne(this IEnumerable source)
    {
        Random rnd = new Random();
        double count = 1;
        T result = default(T);
        foreach (var element in source)
        {
            if (rnd.NextDouble() <= (1.0 / count)) 
            {
                result = element;
            }
            ++count;
        }
        return result;
    }
    

    On a side note: if we're using Linq, we would expect operations to use the IEnumerable once (and only once!). Now you know why.

    Making it work with lists and arrays

    While this is a neat trick, our performance will now be slower if we work on a List, which doesn't make any sense because we know there's a much better implementation available due the the property that indexing and Count are available to us.

    What we're looking for is the common denominator for this better solution, that's used in as many collections as we can find. The thing we'll end up with is the IReadOnlyList interface, that implements everything we need.

    Because of the properties that we know to be true for IReadOnlyList, we can now safely use Count and indexing, without running the risk of crashing the application.

    However, while IReadOnlyList seems appealing, IList for some reason doesn't seem to implement it... which basically means that IReadOnlyList is a bit of a gamble in practice. In that respect, I'm pretty sure there are a lot more IList implementations out there than IReadOnlyList implementations. It therefore seems best to simply support both interfaces.

    This leads us to the solution here:

    public static T AnyOne(this IEnumerable source)
    {
        var rnd = new Random();
        var list = source as IReadOnlyList;
        if (list != null)
        {
            int index = rnd.Next(0, list.Count);
            return list[index];
        }
    
        var list2 = source as IList;
        if (list2 != null)
        {
            int index = rnd.Next(0, list2.Count);
            return list2[index];
        }
        else
        {
            double count = 1;
            T result = default(T);
            foreach (var element in source)
            {
                if (rnd.NextDouble() <= (1.0 / count))
                {
                    result = element;
                }
                ++count;
            }
            return result;
        }
    }
    

    PS: For more complex scenario's, check out the Strategy Pattern.

    Random

    @Yannick Motton made the remark that you have to be careful with Random, because it won't be really random if you call methods like this a lot of times. Random is initialized with the RTC, so if you make a new instance a lot of times, it won't change the seed.

    A simple way around this is as follows:

    private static int seed = 12873; // some number or a timestamp.
    
    // ...
    
    // initialize random number generator:
    Random rnd = new Random(Interlocked.Increment(ref seed));
    

    This way, every time you call AnyOne, the random number generator will receive another seed and it will work even in tight loops.

    To summarize:

    So, to summarize it:

    • IEnumerable's should be iterated once, and only once. Doing otherwise might give the user unexpected results.
    • If you have access to better capabilities than simple enumeration, it's not necessary to go through all the elements. Best to grab the right result right away.
    • Consider what interfaces you're checking very carefully. While IReadOnlyList is definitely the best candidate, it's not inherited from IList which means it'll be less effective in practice.

    The end result is something that Just Works.

提交回复
热议问题