问题
I'm performing I/O to a single file from multiple threads. Access to this shared file foo
is controlled through an advisory file lock (flock(2)
with LOCK_EX
). foo
was opened with fopen(3)
mode a+
. a+
was chosen because of the documentation stating:
Subsequent writes to the file will always end up at the then current end of file, irrespective of any intervening
fseek(3)
or similar.
Simplified, the operations would start:
FILE *fp = fopen("foo", "a+");
...spawn threads...
Writing would continue:
flock(fileno(fp), LOCK_EX);
fwrite(buffer, buffer_size, 1, fp);
flock(fileno(fp), LOCK_UN);
I currently do not have any fflush(3)
or fsync(2)
calls before the fwrite(3)
and am wondering if I should. Does the fopen(3)
a+
mode take into account multiple threads hitting the file when calculating the "current EOF"? I know that flock(2)
likely has no problem granting me the lock while there is outstanding I/O.
In my limited tests (write very long lines of ASCII text followed by a newline in multiple threads for many seconds, then ensure the number of characters on each line in the resulting file are equal), I have not seen any "corruption" when not using fflush(3)
or fsync(2)
. Their presence greatly decreases I/O performance.
tl;dr:
When using file locks, do I need to flush the stream before writing to a shared file between multiple threads with opened in a+
mode? Multiple forks/different machines writing to a file a parallel file system?
Possibly related: why fseek or fflush is always required between reading and writing in the read/write "+" modes
回答1:
That is the wrong kind of lock. flock
is only for locking between processes, not between threads in the same process. From man 2 flock
:
A call to flock() may block if an incompatible lock is held by another process. To make a nonblocking request, include LOCK_NB (by ORing) with any of the above operations.
Emphasis added. And...
A process may only hold one type of lock (shared or exclusive) on a file. Subsequent flock() calls on an already locked file will convert an existing lock to the new lock mode.
You want to use flockfile
instead (or additionally, if using multiple processes as well). The flockfile
function which is used for controlling access to a FILE *
from multiple threads. From the man page:
The stdio functions are thread-safe. This is achieved by assigning to each FILE object a lockcount and (if the lockcount is nonzero) an own‐ ing thread. For each library call, these functions wait until the FILE object is no longer locked by a different thread, then lock it, do the requested I/O, and unlock the object again. (Note: this locking has nothing to do with the file locking done by functions like flock(2) and lockf(3).)
Like this:
// in one of the threads...
flockfile(fp);
fwrite(..., fp);
funlockfile(fp);
The good news is that on glibc
, you don't need to lock the file if you only have one function call from stdio.h in each critical section, since glibc
has a fwrite
that locks. But this is not true across other platforms, and it certainly doesn't hurt to lock the file. So if you are running on Linux, you never would have noticed that flock
doesn't do what you want, since fwrite
does it automatically.
About append mode: You do not need extra flushes when writing using append mode, unless you want to ensure ordering between different processes that have the same file open (or one process with multiple handles for the same file). You do not need "a+" mode unless you are reading from the file.
Demonstration of flock
If you don't believe me that flock
does NOT provide thread safety between threads using the same file descriptor, here is a demonstration program.
#include <stdio.h>
#include <errno.h>
#include <pthread.h>
#include <string.h>
#include <stdlib.h>
#include <sys/file.h>
static FILE *fp;
static pthread_mutex_t mutex;
static pthread_cond_t cond;
int state;
static void fail_func(int code, const char *func, int line)
{
fprintf(stderr, "%s:%d: error: %s\n", func, line, strerror(code));
exit(1);
}
#define fail(code) fail_func(code, __FUNCTION__, __LINE__)
void *thread1(void *p)
{
int r;
// Lock file (thread 2 does not have lock yet)
r = pthread_mutex_lock(&mutex);
if (r) fail(r);
r = flock(fileno(fp), LOCK_EX);
if (r) fail(errno);
puts("thread1: flock successful");
state = 1;
r = pthread_mutex_unlock(&mutex);
if (r) fail(r);
// Wake thread 2
r = pthread_cond_signal(&cond);
if (r) fail(r);
// Wait for thread 2
r = pthread_mutex_lock(&mutex);
if (r) fail(r);
while (state != 2) {
r = pthread_cond_wait(&cond, &mutex);
if (r) fail(r);
}
puts("thread1: exiting");
r = pthread_mutex_unlock(&mutex);
if (r) fail(r);
return NULL;
}
void *thread2(void *p)
{
int r;
// Wait for thread 1
r = pthread_mutex_lock(&mutex);
if (r) fail(r);
while (state != 1) {
r = pthread_cond_wait(&cond, &mutex);
if (r) fail(r);
}
// Also lock file (thread 1 already has lock)
r = flock(fileno(fp), LOCK_EX);
if (r) fail(r);
puts("thread2: flock successful");
// Wake thread 1
state = 2;
puts("thread2: exiting");
r = pthread_mutex_unlock(&mutex);
if (r) fail(r);
r = pthread_cond_signal(&cond);
if (r) fail(r);
return NULL;
}
int main(int argc, char *argv[])
{
pthread_t t1, t2;
void *ret;
int r;
r = pthread_mutex_init(&mutex, NULL);
if (r) fail(r);
r = pthread_cond_init(&cond, NULL);
if (r) fail(r);
fp = fopen("flockfile.txt", "a");
if (!fp) fail(errno);
r = pthread_create(&t1, NULL, thread1, NULL);
if (r) fail(r);
r = pthread_create(&t2, NULL, thread2, NULL);
if (r) fail(r);
r = pthread_join(t1, &ret);
if (r) fail(r);
r = pthread_join(t2, &ret);
if (r) fail(r);
puts("done");
return 0;
}
On my system, it produces the following output:
thread1: flock successful thread2: flock successful thread2: exiting thread1: exiting done
Note that thread 1 does not release the flock
, and thread 2 is able to acquire it anyway. The use of a condition variable ensures that thread 1 does not exit until thread 2 has acquired the lock. This is exactly what the flock
man page says, because flock
says that the locks are per-file and per-process but NOT per-thread.
Summary for atomically appending to a file
In order to make an atomic write between processes and threads, you can do one of two easy things:
Use
write
and write no more thanPIPE_BUF
bytes.PIPE_BUF
is defined in<limits.h>
, on my system it is 4096. If the file descriptor is open inO_APPEND
mode, then the write will go atomically to the end of the file, no matter who else is writing to the file (threads and/or processes).Use
write
andflock
. If you ever write more thanPIPE_BUF
bytes at a time, this is your only option for all writes. Again, if the file is open inO_APPEND
mode, then the bytes will go to the end of the file. This will happen atomically, but only from the perspective of everyone with anflock
.
Additionally,
If you use
<stdio.h>
and share aFILE *
between threads, you will also need to callflockfile
from each thread. This is not needed if you use the lower-level POSIX API (open
/write
/etc). This is also not needed if you useglibc
and every write is a single function call (e.g., you want to atomicallyfputs
).If you use only one process,
flock
is not needed.
回答2:
The answer above is not quite right.
write() to a file with append mode is atomic between multi-threads and multi-process, no matter how much bytes is written in one time. See the standard : http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation.
if write() to a FIFO or pipe with append mode, the PIPE_BUF limits the max size of atomic write.
the stdio library do not guarantee multi-process or multi-threads atomic append write. As each FILE* has its own buffer.
flockfile works just when 1. Only one process operates the file 2. multi-threads write the file with one FILE*.
So, when multi-process or multi-threads need to write file with stdio functions, using advisory lock is the only choice, flock only works under linux, using fcntl is portable.
来源:https://stackoverflow.com/questions/8717490/is-a-sync-flush-needed-before-writes-to-a-locked-file-from-multiple-threads-proc