If i\'m using soft delete in laravel 4.2 (database is mysql), every eloquent query has WHERE deleted_at IS NULL
. There are no indexes on deleted_at
I don't know why the above by @N.B. has so many upvotes, in my context I find this completely incorrect.
I added indexes to deleted_at timestamps on some keys tables and enjoyed some queries dropping from 32s to less than 5.4ms. It really depends on the nature of of your app.
In my scenario, I have 3 tables with soft deletes, some simple joins (all with indexes), but my queries suffered due to the default nature of Laravel's handling of soft deletes.
I highly recommended indexing these columns so your app does choke when record numbers elevate.
Short Answer: Maybe.
Long Answer:
If there are very few different values in deleted_at
, MySQL will not use INDEX(deleted_at)
.
If there are a not of different non-null dates in deleted_at
, MySQL will use INDEX(deleted_at)
.
Most of the discussion (so far) has failed to bring the cardinality of this single-column index into consideration.
Note: This is not the same as a 2-value flag such as is_deleted
. It is useless to have a single-column index on such.
More discussion (from MySQL point of view)
https://laravel.com/docs/5.2/eloquent#soft-deleting says
Now, when you call the delete method on the model, the deleted_at column will be set to the current date and time. And, when querying a model that uses soft deletes, the soft deleted models will automatically be excluded from all query results.
From that, I assume this is occurring in the table definition:
deleted_at DATETIME NULL -- (or TIMESTAMP NULL)
And the value is initialized (explicitly or implicitly) to NULL
.
Case 1: Lots of new rows, none yet 'deleted': All the deleted_at
values are NULL
. In this case, the Optimizer will shun INDEX(deleted_at)
as not helping. In fact using the index would hurt because it would cost more to go through the entire index and the data. It would be cheaper to ignore the index and simply assume all rows are candidates for being SELECTed
.
Case 2: A few rows (out of many) have been deleted: Now deleted_at
has multiple values. Although Laravel only cares about IS NULL
vs IS NOT NULL
, MySQL sees it as a multi-valued column. But, since the test is for IS NULL
and most rows are still NULL
, the Optimizer's reaction is the same as for Case 1.
Case 3: A lot more rows are soft-deleted than still active: Now the index has suddenly become useful because only a small percentage of the table IS NULL
.
There is no exact cutoff between Case 2 and Case 3. 20% is a handy Rule of Thumb.
Now, from the execution point of view.
INDEX(deleted_at)
used for deleted_at IS NULL
:
NULL
.IS NULL
fails.INDEX(deleted_at)
is not used:
deleted_at IS NULL
, else filter out that row.Composite index:
It may be very beneficial to have a "composite" (multi-column) index starting with deleted_at
. Example:
INDEX(deleted_at, foo)
WHERE deleted_at IS NULL
AND foo BETWEEN 111 AND 222
This is very likely to use the index effectively regardless of what percentage of the table has deleted_at IS NULL
.
NULL
and foo >= 111
.IS NULL
or foo <= 222
fails.Note that in an INDEX
, NULL
acts very much like any other single value. (And NULLs
are stored before other values.)
The column deleted_at
is not a good index candidate. I'll try to explain better compared to the comment: indexes are useful only when their cardinality is relatively high. Cardinality is a number that describes index uniqueness in the data-set. That means it's total number of records divided by total unique records.
For example, the cardinality of primary key is 1. Every record contains unique value for primary key. 1 is also, the highest number. You can consider it as a "100%".
But, a column such as deleted_at doesn't have such a value. What Laravel does with deleted_at is check whether it is or isn't null. That means it has two possible values. Columns that contain two values have extremely low cardinality which decreases as number of records goes up.
You can index such a column, but it won't be of any help. What will happen is that it could slow things down and take up space.
TL;DR: no, you don't have to index that column, index will have no beneficial impact on performance.