I'm working with the filestream datatype for the first time (SQL Server 2008) and I'm running into issues when I am doing some fast insert/deletes. Basically the speed at which the files are actually removed from the FileSystem is way slower then the insert/delete speed even if I invoke the garbage collector manually (As far as I know the CHECKPOINT is supposed to invoke the garbage collector).
The code below illustrate the problem - It takes roughly 30 seconds to executes, but you have to wait quite a few minutes for the last file to be deleted from the filesystem (When I look up the C:\FSTest\Files folder)
Is there any way to speed up the garbage collector? (It seems to roughly deletes 20 files every 10 seconds - which makes me believe that if I store/delete more then 2 records per second I will eventually end up filling the hard drive)
Thank you
CREATE DATABASE FSTest ON PRIMARY
(NAME = FSTest_data, FILENAME = N'C:\FSTest\FSTest_data.mdf'),
FILEGROUP FSTestFileGroup CONTAINS FILESTREAM
(NAME = FSTestFiles,FILENAME = N'C:\FSTest\Files')
LOG ON
(NAME = 'FSTest_log', FILENAME = N'C:\FSTest\FSTest_log.ldf');
GO
USE FSTest;
GO
CREATE TABLE FSTest (
Guid UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE DEFAULT NEWSEQUENTIALID(),
Name VARCHAR (25),
Data VARBINARY(MAX) FILESTREAM);
GO
ALTER DATABASE FSTest SET RECOVERY SIMPLE;
GO
SET NOCOUNT ON
DECLARE @test int
SET @test=0
WHILE @test<1000 BEGIN
INSERT INTO FSTest(Name,Data) VALUES('test',CAST('abc' AS VARBINARY(MAX)))
DELETE FROM FSTest WHERE Name='test'
CHECKPOINT
SET @test = @test+1
END
Update:
I tried the same for a longer period of time with a insert/delete speed closer to my needs and after 30 minutes of execution the same situation is observable: Files are created way faster then they get deleted.
SET NOCOUNT ON
DECLARE @test int
SET @test=0
WHILE @test<100000 BEGIN
INSERT INTO FSTest(Name,Data) VALUES('test',CAST('abc' AS VARBINARY(MAX)))
DELETE FROM FSTest WHERE Name='test'
WAITFOR DELAY '00:00:00:200'
CHECKPOINT
SET @test = @test+1
END
After some more research (and thanks to Paul Randal's blog - lots of very detailed information surrounding filestream and garbage collection), once the rows are deleted and a checkpoint is executed, the files are put in a system table (Tombstone table), then every 10 seconds a process (Ghost Cleanup) runs and remove some items from that table (20 to be exact). So basically we are limited to 2 deletes/seconds and there seems to be no way (yet) to change this behavior.
Since I have a sustained 4 deletes per seconds I will need to find an alternative to filestream.
Thanks everyone for your inputs.
As Remus said, if you're using full recovery model then things are complicated. But even under simple recovery model you need to keep in mind that CHECKPOINT invokes the garbage collector (GC), but it doesn't guarantee that GC will delete all the files in a single pass. GC currently has a limit on the number of operations it may carry on in a single invocation. Furthermore, files are deleted with FILE_DELETE_ON_CLOSE option, therefore as long as there are open handles to a file, you will still see it, even though GC may have already deleted it. Such open handles may be held by anti-virus programs or other file system filter drivers.
Lastly, unless you're actually running out of disk space, I wouldn't worry that much about stale files - they will eventually get deleted as part of automatic database checkpoints. I believe (although believe is the keyword here) that even though it may have slow "startup time", GC will keep up with physically deleting files if you run your test steadily for a longer period of time (minutes?).
If you care about performance, keeping filestream containers on a system drive may not be such a good idea. See here for filestream performance tips.
Things are a bit more complicated than a simple checkpoint. The file can be removed when the last VLF containing log recordas about the file creation is inactive. See FILESTREAM garbage collection.
来源:https://stackoverflow.com/questions/2158702/sql-server-filestream-delete-speed