Really force file sync/flush in Java

别来无恙 提交于 2019-11-28 04:07:18

You need to tell us more about the hardware and operating system, also the specific Java version. How are you measuring this throughput?

You're correct that force/sync should force the data out to the physical media.


Here's a raw version of copy. Compiled with gcc 4.0 on an Intel Mac, should be clean.

/* rawcopy -- pure C, system calls only, copy argv[1] to argv[2] */

/* This is a test program which simply copies from file to file using
 * only system calls (section 2 of the manual.)
 *
 * Compile:
 *
 *      gcc -Wall -DBUFSIZ=1024 -o rawcopy rawcopy.c
 *
 * If DIRTY is defined, then errors are interpreted with perror(3).
 * This is ifdef'd so that the CLEAN version is free of stdio.  For
 * convenience I'm using BUFSIZ from stdio.h; to compile CLEAN just
 * use the value from your stdio.h in place of 1024 above.
 *
 * Compile DIRTY:
 *
 *      gcc -DDIRTY -Wall -o rawcopy rawcopy.c
 *
 */
#include <fcntl.h>
#include <sys/types.h>
#include <sys/uio.h>
#include <stdlib.h>
#include <unistd.h>
#if defined(DIRTY)
#   if defined(BUFSIZ)
#       error "Don't define your own BUFSIZ when DIRTY"
#   endif
#   include <stdio.h>
#   define PERROR perror(argv[0])
#else
#   define CLEAN
#   define PERROR
#   if ! defined(BUFSIZ)
#       error "You must define your own BUFSIZ with -DBUFSIZ=<number>"
#   endif
#endif

char * buffer[BUFSIZ];          /* by definition stdio BUFSIZ should
                                   be optimal size for read/write */

extern int errno ;              /* I/O errors */

int main(int argc, char * argv[]) {
    int fdi, fdo ;              /* Input/output file descriptors */
    ssize_t len ;               /* length to read/write */
    if(argc != 3){
        PERROR;
        exit(errno);
    }

    /* Open the files, returning perror errno as the exit value if fails. */
    if((fdi = open(argv[1],O_RDONLY)) == -1){
        PERROR;
        exit(errno);
    }
    if((fdo = open(argv[2], O_WRONLY|O_CREAT)) == -1){
        PERROR;
        exit(errno);
    }

    /* copy BUFSIZ bytes (or total read on last block) fast as you
       can. */
    while((len = read(fdi, (void *) buffer, BUFSIZ)) > -1){
        if(len == -1){
            PERROR;
            exit(errno);
        }
        if(write(fdo, (void*)buffer, len) == -1){
            PERROR;
            exit(errno);
        }
    }
    /* close and fsync the files */
    if(fsync(fdo) ==-1){
        PERROR;
        exit(errno);
    }
    if(close(fdo) == -1){
        PERROR;
        exit(errno);
    }
    if(close(fdi) == -1){
        PERROR;
        exit(errno);
    }

    /* if it survived to here, all worked. */
    exit(0);
}
araqnid

Actually, in C you want to just call fsync() on the one file descriptor, not sync() (or the "sync" command) which signals the kernel to flush all buffers to disk system-wide.

If you strace (getting Linux-specific here) the JVM you should be able to observe an fsync() or fdatasync() system call being made on your output file. That would be what I'd expect the getFD().sync() call to do. I assume c.force(true) simply flags to NIO that fsync() should be called after each write. It might simply be that the JVM you're using doesn't actually implement the sync() call?

I'm not sure why you weren't seeing any difference when calling "sync" as a command: but obviously, after the first sync invocation, subsequent ones are usually quite a lot faster. Again, I'd be inclined to break out strace (truss on Solaris) as a "what's actually happening here?" tool.

It is a good idea to use the synchronized I/O data integrity completion. However your C sample is using the wrong method. You use sync(), which is used to sync the whole OS.

If you want to write the blocks of that single file to disk, you need to use fsync(2) or fdatasync(2) in C. BTW: when you use buffered stdio in C (or a BufferedOutputStream or some Writer in Java) you need to flush both first before you sync.

The fdatasync() variant is a bit more efficient if the file has not changed name or size since you sync. But it might also not persit all the meta data. If you want to write your own transactional safe database systems, you need to observe some more stuff (like fsyncing the parent directory).

The C code could be suboptimal, because it uses stdio rather than raw OS write(). But then, java could be more optimal because it allocates larger buffers?

Anyway, you can only trust the APIDOC. The rest is beyond your duties.

(I know this is a very late reply, but I ran into this thread doing a Google search, and that's probably how you ended up here too.)

Your calling sync() in Java on a single file descriptor, so only that buffers related to that one file get flushed out to disk.

In C and command-line, you're calling sync() on the entire operating system - so every file buffer gets flushed out to disk, for everything your O/S is doing.

To be comparable, the C call should be to syncfs(fp);

From the Linux man page:

   sync() causes all buffered modifications to file metadata and data to
   be written to the underlying file systems.

   syncfs() is like sync(), but synchronizes just the file system contain‐
   ing file referred to by the open file descriptor fd.
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!