问题
I'm running a Django application on top of a MySQL (actually MariaDB) database.
My Django Model looks like this:
from django.db import models
from django.db.models import Avg, Max, Min, Count
class myModel(models.Model):
my_string = models.CharField(max_length=32,)
my_date = models.DateTimeField()
@staticmethod
def get_stats():
logger.info(myModel.objects.values('my_string').annotate(
count=Count("my_string"),
min=Min('my_date'),
max=Max('my_date'),
avg=Avg('my_date'),
)
)
When I run get_stats(), I get the following log line:
[2015-06-21 09:45:40] INFO [all_logs:96] [{'my_string': u'A', 'count': 2, 'avg': 20080507582679.5, 'min': datetime.datetime(2007, 8, 2, 11, 33, 53, tzinfo=<UTC>), 'max': datetime.datetime(2009, 2, 13, 5, 20, 6, tzinfo=<UTC>)}]
The problem I have with this is that the average of the my_date field returned by the database is: 20080507582679.5. Look carefully at that number. It is an invalid date format.
Why doesn't the database return a valid value for the average of these two dates? How do I get the actual average of this field if the way described fails? Is Django DateTimeField not setup to do handle averaging?
回答1:
Q1: Why doesn't the database return a valid value for the average of these two dates?
A: The value returned is expected, it's well defined MySQL behavior.
MySQL automatically converts a date or time value to a number if the value is used in a numeric context and vice versa.
MySQL Reference Manual: https://dev.mysql.com/doc/refman/5.5/en/date-and-time-types.html
In MySQL, the AVG aggregate function operates on numeric values.
In MySQL, a DATE or DATETIME expression can be evaluated in a numeric context.
As a simple demonstration, performing an numeric addition operation on a DATETIME implicitly converts the datetime value into a number. This query:
SELECT NOW(), NOW()+0
returns a result like:
NOW() NOW()+0
------------------- -----------------------
2015-06-23 17:57:48 20150623175748.000000
Note that the value returned for the expression NOW()+0 is not a DATETIME, it's a number.
When you specify a SUM() or AVG() function on a DATETIME expression, that's equivalent to converting the DATETIME into a number, and then summing
or averaging the number.
That is, the return from this expression AVG(mydatetimecol) is equivalent to the return from this expression: AVG(mydatetimecol+0)
What is being "averaged" is a numeric value. And you have observed, the value returned is not a valid datetime; and even in cases where it happens to look like a valid datetime, it's likely not a value you would consider a true "average".
Q2: How do I get the actual average of this field if the way described fails?
A2: One way to do that is to convert the datetime into a numeric value that can be "accurately" averaged, and then convert that back into a datetime.
For example, you could convert the datetime into a numeric value representing a number of seconds from some fixed point in time, e.g.
TIMESTAMPDIFF(SECOND,'2015-01-01',t.my_date)
You could then "average" those values, to get an average number of seconds from a fixed point in time. (NOTE: beware of adding up an extremely large number of rows, with extremely large values, and exceeding the limit (maximum numeric value), numeric overflow issues.)
AVG(TIMESTAMPDIFF(SECOND,'2015-01-01',t.my_date))
To convert that back to a datetime, add that value as a number of seconds back to a the fixed point in time:
'2015-01-01' + INTERVAL AVG(TIMESTAMPDIFF(SECOND,'2015-01-01',t.my_date)) SECOND
(Note that the DATEIME values are evaluated in the timezone of the MySQL session; so there are edge cases where the setting of the time_zone variable in the MySQL session will have some influence on the value returned.)
MySQL also provides a UNIX_TIMESTAMP() function which returns a unix-style integer value, number of seconds from the beginning of the era (midnight Jan. 1, 1970 UTC). You can use that to accomplish the same operation more concisely:
FROM_UNIXTIME(AVG(UNIX_TIMESTAMP(t.my_date)))
Note that this final expression is really doing the same thing... converting the datetime value into a number of seconds since '1970-01-01 00:00:00' UTC, taking a numeric average of that, and then adding that average number of seconds back to '1970-01-01' UTC, and finally converting that back to a DATETIME value, represented in the current session time_zone.
Q3: Is Django DateTimeField not setup to do handle averaging?
A: Apparently, the authors of Django are satisfied with the value returned from the database for a SQL expression AVG(datetime).
回答2:
Plan A: Use a TIMESTAMP field instead of a DATETIME field
Plan B: Convert DATETIME to TIMESTAMP during the computation:
FROM_UNIXTIME(ROUND(AVG(UNIX_TIMESTAMP(`my_date`))))
(Sorry, I don't know the Django syntax needed.)
回答3:
When you use values(), Django will not convert the value it got from the database-python connector. It's up to the connector to determine how the value is returned.
In this case, it seems that the MySQL connector returns a string-representation with the separators removed. You can try to use datetime.strptime() with a matching format to parse it into a datetime object.
来源:https://stackoverflow.com/questions/30963319/why-does-mysql-db-return-a-corrupted-value-when-averaging-over-a-django-models-d