问题
[10] pry(main)> r.respondents.select(:name).uniq.size
(1.1ms) SELECT DISTINCT COUNT("respondents"."name") FROM "respondents"
INNER JOIN "values" ON "respondents"."id" = "values"."respondent_id" WHERE
"values"."round_id" = 37 => 495
[11] pry(main)> r.respondents.select(:name).uniq.length
Respondent Load (1.1ms) SELECT DISTINCT name FROM "respondents"
INNER JOIN "values" ON "respondents"."id" = "values"."respondent_id" WHERE
"values"."round_id" = 37 => 6
Why the difference in what each query returns?
回答1:
.count #=> this always triggers a SELECT COUNT(*) on the database
.size #=> if the collection has been loaded, defers to Enumerable#size, else does the SELECT COUNT(*)
.length #=> always loads the collection and then defers to Enumerable#size
回答2:
r.respondents.select(:name).uniq returns an ActiveRecord::Relation object, which overrides size.
See: http://api.rubyonrails.org/classes/ActiveRecord/Relation.html#method-i-size
Calling size on such an object checks to see if the object is "loaded."
# Returns size of the records.
def size
loaded? ? @records.length : count
end
If it is "loaded", it returns the length of the @records array. Otherwise, it calls count, which, without arguments, will "return a count of all the rows for the model."
So why this behavior? An AR::Relation is only "loaded" if either to_a or explain is called on it first:
https://github.com/rails/rails/blob/master/activerecord/lib/active_record/relation.rb
The why is explained in a comment above the load method:
# Causes the records to be loaded from the database if they have not
# been loaded already. You can use this if for some reason you need
# to explicitly load some records before actually using them. The
# return value is the relation itself, not the records.
#
# Post.where(published: true).load # => #<ActiveRecord::Relation>
def load
unless loaded?
# We monitor here the entire execution rather than individual SELECTs
# because from the point of view of the user fetching the records of a
# relation is a single unit of work. You want to know if this call takes
# too long, not if the individual queries take too long.
#
# It could be the case that none of the queries involved surpass the
# threshold, and at the same time the sum of them all does. The user
# should get a query plan logged in that case.
logging_query_plan { exec_queries }
end
self
end
So, perhaps using AR::Relation#size is a measure of the size of the potential complexity of queries on this relation, where length falls back to a count of the returned records.
回答3:
While converting Rails 3.2 to 4.1 it seems AR::Relation#size is different. Previously it returned the number of "rows" whereas (in my case) it now returned a Hash. Changing to use #count seems to give the same result as #size in 3.2. I'm being a bit vague here since running tests in 'rails console' on 4.1 did not give the same results when running via 'rails server' on 4.1
来源:https://stackoverflow.com/questions/11905364/difference-between-size-length-and-count-in-complicated-activerecord-case