Do we need mutex to perform multithreading file IO

时光怂恿深爱的人放手 提交于 2020-01-06 07:27:12

问题


I'm trying to do random write (Benchmark test) to a file using multiple threads (pthread). Looks like if I comment out mutex lock the created file size is less than actual as if Some writes are getting lost (always in some multiple of chunk size). But if I keep the mutex it's always exact size.

Is my code have a problem in other place and mutex is not really required (as suggested by @evan ) or mutex is necessary here

void *DiskWorker(void *threadarg) {

FILE *theFile = fopen(fileToWrite, "a+");
....
for (long i = 0; i < noOfWrites; ++i) {
            //pthread_mutex_lock (&mutexsum);
            // For Random access

            fseek ( theFile , randomArray[i] * chunkSize  , SEEK_SET );
            fputs ( data , theFile );

            //Or for sequential access (in this case above 2 lines would not be here)

            fprintf(theFile, "%s", data);
            //sequential access end

            fflush (theFile);
            //pthread_mutex_unlock(&mutexsum);
        }
.....
}

回答1:


You definitely need a mutex because you are issuing several different file commands. The underlying file subsystem can't possibly know how many file commands you are going to call to complete your whole operation.

So you need the mutex.

In your situation you may find you get better performance putting the mutex outside the loop. The reason being that, otherwise, switching between threads may cause excessive skipping between different parts of the disk. Hard disks take about 10ms to move the read/write head so that could potentially slow things down a lot.

So it might be a good idea to benchmark that.




回答2:


You are opening a file using "append mode". According to C11:

Opening a file with append mode ('a' as the first character in the mode argument) causes all subsequent writes to the file to be forced to the then current end-of-file, regardless of intervening calls to the fseek function.

C standard does not specified how exactly this should be implemented, but on POSIX system this is usually implemented using O_APPEND flag of open function, while flushing data is done using function write. Note that fseek call in your code should have no effect.

I think POSIX requires this, as it describes how redirecting output in append mode (>>) is done by the shell:

Appended output redirection shall cause the file whose name results from the expansion of word to be opened for output on the designated file descriptor. The file is opened as if the open() function as defined in the System Interfaces volume of POSIX.1-2008 was called with the O_APPEND flag. If the file does not exist, it shall be created.

And since most programs use FILE interface to send data to stdout, this probably requires fopen to use open with O_APPEND and write (and not functions like pwrite) when writing data.

So if on your system fopen with 'a' mode uses O_APPEND and flushing is done using write and your kernel and filesystem correctly implement O_APPEND flag, using mutex should have no effect as writes do not intervene:

If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation.

Note that not all filesystems support this behavior. Check this answer.


As for my answer to your previous question, my suggestion was to remove mutex as it should have no effect on the size of a file (and it didn't have any effect on my machine).

Personally, I never really used O_APPEND and would be hesitant to do so, as its behavior might not be supported at some level, plus its behavior is weird on Linux (see "bugs" section of pwrite).



来源:https://stackoverflow.com/questions/49377419/do-we-need-mutex-to-perform-multithreading-file-io

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!