问题
I have a large number of items (1M+) that i want to delete from a database, i fork a background job to take care of that, so that the user won't have to wait for it to finish to carry on whatever he/she was doing, the problem is, the app becomes unresponsive while the items are being deleted, so i thought that i would process the items chunk by chunk and sleep for a couple of seconds then carry on.
Here is the code that handles the delete:
// laravel job class
// ...
public function handle()
{
$posts_archive = PostArchive::find(1); // just for the purpose of testing ;)
Post::where('arch_id', $posts_archive->id)->chunk(1000, function ($posts) {
//go through the collection and delete every post.
foreach($posts as $post) {
$post->delete();
}
// throttle
sleep(2);
});
}
Expected result: the posts are chunked and each chunk is processed, then idle for 2 seconds, repeat that until all the items are deleted.
Actual result: a random number of items is deleted once, then the process ends. no errors no indicators, no clue ?
is there a better way to implement this?
回答1:
There is nothing Laravel specific about the way you'd handle this. It sounds like your database server needs review or optimization if a delete query in a job is freezing the rest of the UI.
Retrieving each model and running a delete query individually definitely isn't a good way to optimize this as you'd be executing millions of queries. You could use a while loop with a delete limit if you wish to try to limit the load per second in your application instead of optimizing your database server to handle this query:
do {
$deleted = Post::where('arch_id', $posts_archive->id)->limit(1000)->delete();
sleep(2);
} while ($deleted > 0);
回答2:
The reason your actual outcome is different to the expected outcome is to do with how Laravel chunks your dataset.
Laravel paginates through your dataset 1-page at a time, and passes the Collection of Post
models to your callback.
Since you're deleting the records in the set, Laravel effectively skips a page of data on each iteration, therefore you end up missing roughly half the data that was in the original query.
Take the following scenario – there are 24 records that you wish to delete in chunks of 10:
Expected
+-------------+--------------------+---------------------------+ | Iteration | Eloquent query | Rows returned to callback | +-------------+--------------------+---------------------------+ | Iteration 1 | OFFSET 0 LIMIT 10 | 10 | | Iteration 2 | OFFSET 10 LIMIT 10 | 10 | | Iteration 3 | OFFSET 20 LIMIT 10 | 4 | +-------------+--------------------+---------------------------+
Actual
+-------------+--------------------+----------------------------+ | Iteration | Eloquent query | Rows returned to callback | +-------------+--------------------+----------------------------+ | Iteration 1 | OFFSET 0 LIMIT 10 | 10 | (« but these are deleted) | Iteration 2 | OFFSET 10 LIMIT 10 | 4 | | Iteration 3 | NONE | NONE | +-------------+--------------------+----------------------------+
After the 1st iteration, there were only 14 records left, so when Laravel fetched page 2, it only found 4 records.
The result, is that 14 records out of 24 were deleted, and this feels a bit random but makes sense in terms of how Laravel processes the data.
Another solution to the problem would be to use a cursor to process your query, this will step through your DB result-set 1 record at a time, which is better use of memory.
E.g.
// laravel job class
// ...
public function handle()
{
$posts_archive = PostArchive::find(1); // just for the purpose of testing ;)
$query = Post::where('arch_id', $posts_archive->id);
foreach ($query->cursor() as $post) {
$post->delete();
}
}
NB: The other solutions here are better if you only want to delete the records in the DB. If you have any other processing that needs to occur, then using a cursor would be a better option.
回答3:
If i understand correctly, the issue is that deleting a large amount of entries takes too much ressources. doing it one post at a time will take too long too.
try getting the min and the max of post.id then chunk on those like
for($i = $minId; $i <= $maxId-1000; $i+1000) {
Post::where('arch_id', $posts_archive->id)->whereBetween('id', [$i, $i+1000])->delete();
sleep(2);
}
customize the chunk and the sleep period as it suites your server ressources.
来源:https://stackoverflow.com/questions/52483342/laravel-chunk-and-delete