问题
From sql background I know
The cardinality of an index is the number of unique values within it. Your database table may have a billion rows in it, but if it only has 8 unique values among those rows, your cardinality is very low.
A low cardinality index is not a major efficiency gain. Most SQL indexes are binary search trees (B-Trees). Versus a serial scan of every row in a table to find matching constraints, a B-Tree logarithmically reduces the number of comparisons that have to be made. The gains from executing a search against a B-Tree are very low when the size of the tree is small.
So putting an index on a Boolean field? Or an enumerated value field? A cardinality of a very small number of distinct values among a very large number of rows will not yield noticeable efficiency gains. Save your database indexes for fields with very high cardinality to ensure the gains from scanning a B-Tree are largest versus sequential scans.
What about mongodb? Must we create index on low cardinality field that often filtered? for instance an enum field with 4 status
回答1:
Yes, MongoDB has the same issue, and it uses B-Trees for indexing. So there will be performance problems with low-cardinality values with an index.
Here's a good article about it
https://www.percona.com/blog/2018/12/19/using-partial-and-sparse-indexes-in-mongodb/
Although there is no easy or supported solution, it gives a few options for specific cases:
- you run queries on a boolean field with an uneven distribution, and you look mostly for the less frequent value
- you have a low cardinality field and the majority of the queries look for a subset of the values
- the majority of the queries look for a limited subset of the values in a field
- you don’t have enough memory to store very large indexes – for example, you have a lot of page evictions from the WiredTiger cache
来源:https://stackoverflow.com/questions/51579528/mongodb-low-cardinality-index