is distinct an expensive query in django?

谁说我不能喝 提交于 2019-12-24 09:58:15

问题


I have three models: Product, Category and Place. Product has ManyToMany relation with Category and Place. I need to get a list of categories with at least on product matching a specific place. For example I might need to get all the categories that has at least one product from Boston.

I have 100 categories, 500 places and 100,000 products.

In sqlite with 10K products the query takes ~ a second. In production I'll use postgresql.

I'm using:

categories = Category.objects.distinct().filter(product__place__name="Boston")

Is this query going to be expensive? Is there a better way to do this?

This is the result of connection.queries

{'time': '0.929', 'sql': u'SELECT DISTINCT "catalog_category"."id", "catalog_category"."name" FROM "catalog_category" INNER JOIN "catalog_product_categories" ON ("catalog_category"."id" = "catalog_product_categories"."category_id") INNER JOIN "catalog_product" ON ("catalog_product_categories"."product_id" = "catalog_product"."id") INNER JOIN "catalog_product_places" ON ("catalog_product"."id" = "catalog_product_places"."product_id") INNER JOIN "catalog_place" ON ("catalog_product_places"."car_id" = "catalog_car"."id") WHERE "catalog_place"."name" = Boston  ORDER BY "catalog_category"."name" ASC'}]

Thanks


回答1:


This is not just a Django issue; DISTINCT is slow on most SQL implementations because it's a relatively hard operation. Here is a good discussion of why it's slow in Postgres specifically.

One way to handle this would be to use Django's excellent caching mechanism on this query, assuming that the results don't change often and minor staleness isn't a problem. Another approach would be to keep a separate list of just the distinct categories, perhaps in another table.




回答2:


Although Chase is right that DISTINCT is generally a slow operation, in this case it is also completely pointless. As you can see from the generated SQL, the DISTINCT is being done on the combination of ID and name - which will never be duplicated anyway. So there is no need for the distinct() call in this query.

Generally, Django does not return duplicate results from a simple filter. The main time when distinct() is useful is when you are accessing a related queryset via a ManyToMany or ForeignKey relationship, where multiple items might be related to the same instance, and distinct will remove the duplicates.



来源:https://stackoverflow.com/questions/1977739/is-distinct-an-expensive-query-in-django

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!