What is the arithmetic mean of an empty sequence?

梦想与她 提交于 2021-02-07 04:41:53

问题


Disclaimer: No, I didn't find any obvious answer, contrary to what I expected!

When looking for code examples wrt. the arithmetic mean, the first several examples I can turn up via Google seem to be defined such that the empty sequence generates a mean value of 0.0. (eg. here and here ...)

Looking at Wikipedia however, the Arithmetic mean is defined such that an empty sequence would yield 0.0 / 0 --

 A = 1/n ∑[i=1 -> n](a[i])

-- so, possibly, that is NaN in the general case.

So if I write a utility function that calculates the arithmetic mean of a set of floating point values, should I, in the general case:

  • return 0. for the empty sequence?
  • return (Q)NaN for the empty sequence?
  • "throw an exception" in case of empty sequence?

回答1:


There isn't an obvious answer because the handling depends on how you want to inform calling code of the error. (Or even if you want to interpret this as an "error".)

Some libraries/programs really don't like raising exceptions, so do everything with signal values. In that case, returning NaN (because the value of the expression is technically undefined) is a reasonable choice.

You might also want to return NaN if you want to "silently" bring the value forward through multiple other calculations. (Relying on the behavior that NaN combined with anything else is "silently" NaN.)

But note that if you return NaN for the mean of an empty sequence, you impose the burden on calling code that they need to check the return value of the function to make sure that it isn't NaN - either immediately upon return or later on. This is a requirement that is easy to miss, depending on how fastidious you are in checking return values.

Because of this, other libraries/programs take the viewpoint that error conditions should be "noisy" - if you passed an empty sequence to a function that's finding the mean of the sequence, then you've obviously doing something majorly wrong, and it should be made abundantly clear to you that you've messed up.

Of course, if exceptions can be raised, they need to handled, but you can do that at a higher level, potentially centralized at the point where it makes more sense to. Depending on your program, this may be easier or more along the lines of your standard error handling scheme than double checking return values.

Other people would argue that your functions should be robust to the error. For maximum robustness, you probably shouldn't use either NaN or an exception - you need to choose an actual number which "makes sense" as a value for the average of an empty list.

Which value is going to be highly specific to your use case. For example, if your sequence is a list of differences/errors, you might to return 0. If you're averaging test scores (scored 0-100), you might want to return 100 for an empty list ... or 0, depending on what your philosophy of the "starting" score is. It all depends on what the return value is going to be used for.

Given that the value of this "neutral" value is going to be highly variable based on exact use case, you might want to actually implement it in two functions - one general function which returns NaN or raises an exception, and another that wraps the general function and recognizes the 'error' case. This way you can have multiple versions, each with a different "default" case. -- or if this is something you're doing a lot of, you might even have the "default" value be a parameter you can pass.

Again, there isn't a single answer to this question: the average of an empty sequence is undefined. How you want to handle it depends intimately on what the result of the calculation is being used for: Just display, or further calculation? Should an empty list be exceptional, or should it be handled quietly? Do you want to handle the special case at the point in time it occurs, or do you want to hoist/defer the error handling?




回答2:


Mathematically, it's undefined as the denominator is zero.

Because the behaviour of integer division by zero is undefined in C++, throw an exception if you're working in integral types.

If you're working in IEEE754 floating point, then return NaN since the numerator will also be zero. (+Inf would be returned if the numerator is positive, and -Inf if the numerator is negative).




回答3:


I suggest to keep the same behavior as for a 0.0 by 0 division, whatever it is. Indeed, one can adopt the as-if rule. This way you remain coherent with other operations and you don't have to make the decision yourself.

(You could even implement it as such, by returning 0.0/0, but the compiler might optimize this in unexpected ways.)




回答4:


I like defensive coding, so I would throw an exception. You can make it either a specific exception (like empty_sequence_exception) or a division by 0, since the divider is the length of the sequence which is 0.

0.0 is debatable since there is no data (sequence).




回答5:


The correct answer is that the arithmetic mean of an empty sequence has no meaning, since an empty sequence is essentially an empty set. Division of nothing is meaningless. Zero is certainly not a correct answer. Say a sequence has 3 members, 1, 0 and -1, or is a sequence of all zeros. The mean of both of these would be zero, and should not be confused with an empty sequence.



来源:https://stackoverflow.com/questions/39706777/what-is-the-arithmetic-mean-of-an-empty-sequence

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!