Is there really no asynchronous block I/O on Linux?

主宰稳场 提交于 2019-11-27 09:46:44

问题


Consider an application that is CPU bound, but also has high-performance I/O requirements.

I'm comparing Linux file I/O to Windows, and I can't see how epoll will help a Linux program at all. The kernel will tell me that the file descriptor is "ready for reading," but I still have to call blocking read() to get my data, and if I want to read megabytes, it's pretty clear that that will block.

On Windows, I can create a file handle with OVERLAPPED set, and then use non-blocking I/O, and get notified when the I/O completes, and use the data from that completion function. I need to spend no application-level wall-clock time waiting for data, which means I can precisely tune my number of threads to my number of cores, and get 100% efficient CPU utilization.

If I have to emulate asynchronous I/O on Linux, then I have to allocate some number of threads to do this, and those threads will spend a little bit of time doing CPU things, and a lot of time blocking for I/O, plus there will be overhead in the messaging to/from those threads. Thus, I will either over-subscribe or under-utilize my CPU cores.

I looked at mmap() + madvise() (WILLNEED) as a "poor man's async I/O" but it still doesn't get all the way there, because I can't get a notification when it's done -- I have to "guess" and if I guess "wrong" I will end up blocking on memory access, waiting for data to come from disk.

Linux seems to have the starts of async I/O in io_submit, and it seems to also have a user-space POSIX aio implementation, but it's been that way for a while, and I know of nobody who would vouch for these systems for critical, high-performance applications.

The Windows model works roughly like this:

  1. Issue an asynchronous operation.
  2. Tie the asynchronous operation to a particular I/O completion port.
  3. Wait on operations to complete on that port
  4. When the I/O is complete, the thread waiting on the port unblocks, and returns a reference to the pending I/O operation.

Steps 1/2 are typically done as a single thing. Steps 3/4 are typically done with a pool of worker threads, not (necessarily) the same thread as issues the I/O. This model is somewhat similar to the model provided by boost::asio, except boost::asio doesn't actually give you asynchronous block-based (disk) I/O.

The difference to epoll in Linux is that in step 4, no I/O has yet happened -- it hoists step 1 to come after step 4, which is "backwards" if you know exactly what you need already.

Having programmed a large number of embedded, desktop, and server operating systems, I can say that this model of asynchronous I/O is very natural for certain kinds of programs. It is also very high-throughput and low-overhead. I think this is one of the remaining real shortcomings of the Linux I/O model, at the API level.


回答1:


The real answer, which was indirectly pointed to by Peter Teoh, is based on io_setup() and io_submit(). Specifically, the "aio_" functions indicated by Peter are part of the glibc user-level emulation based on threads, which is not an efficient implementation. The real answer is in:

io_submit(2)
io_setup(2)
io_cancel(2)
io_destroy(2)
io_getevents(2)

Note that the man page, dated 2012-08, says that this implementation has not yet matured to the point where it can replace the glibc user-space emulation:

http://man7.org/linux/man-pages/man7/aio.7.html

this implementation hasn't yet matured to the point where the POSIX AIO implementation can be completely reimplemented using the kernel system calls.

So, according to the latest kernel documentation I can find, Linux does not yet have a mature, kernel-based asynchronous I/O model. And, if I assume that the documented model is actually mature, it still doesn't support partial I/O in the sense of recv() vs read().




回答2:


As explained in:

http://code.google.com/p/kernel/wiki/AIOUserGuide

and here:

http://www.ibm.com/developerworks/library/l-async/

Linux does provide async block I/O at the kernel level, APIs as follows:

aio_read    Request an asynchronous read operation
aio_error   Check the status of an asynchronous request
aio_return  Get the return status of a completed asynchronous request
aio_write   Request an asynchronous operation
aio_suspend Suspend the calling process until one or more asynchronous requests have completed (or failed)
aio_cancel  Cancel an asynchronous I/O request
lio_listio  Initiate a list of I/O operations

And if you asked who are the users of these API, it is the kernel itself - just a small subset is shown here:

./drivers/net/tun.c (for network tunnelling):
static ssize_t tun_chr_aio_read(struct kiocb *iocb, const struct iovec *iv,

./drivers/usb/gadget/inode.c:
ep_aio_read(struct kiocb *iocb, const struct iovec *iov,

./net/socket.c (general socket programming):
static ssize_t sock_aio_read(struct kiocb *iocb, const struct iovec *iov,

./mm/filemap.c (mmap of files):
generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov,

./mm/shmem.c:
static ssize_t shmem_file_aio_read(struct kiocb *iocb,

etc.

At the userspace level, there is also the io_submit() etc API (from glibc), but the following article offer an alternative to using glibc:

http://www.fsl.cs.sunysb.edu/~vass/linux-aio.txt

It directly implement the API for functions like io_setup() as direct syscall (bypassing glibc dependencies), a kernel mapping via the same "__NR_io_setup" signature should exist. Upon searching the kernel source at:

http://lxr.free-electrons.com/source/include/linux/syscalls.h#L474 (URL is applicable for the latest version 3.13) you are greeted with the direct implementation of these io_*() API in the kernel:

474 asmlinkage long sys_io_setup(unsigned nr_reqs, aio_context_t __user *ctx);
475 asmlinkage long sys_io_destroy(aio_context_t ctx);
476 asmlinkage long sys_io_getevents(aio_context_t ctx_id,
481 asmlinkage long sys_io_submit(aio_context_t, long,
483 asmlinkage long sys_io_cancel(aio_context_t ctx_id, struct iocb __user *iocb,

The later version of glibc should make these usage of "syscall()" to call sys_io_setup() unnecessary, but without the latest version of glibc, you can always make these call yourself if you are using the later kernel with these capabilities of "sys_io_setup()".

Of course, there are other userspace option for asynchronous I/O (eg, using signals?):

http://personal.denison.edu/~bressoud/cs375-s13/supplements/linux_altIO.pdf

or perhap:

What is the status of POSIX asynchronous I/O (AIO)?

"io_submit" and friends are still not available in glibc (see io_submit manpages), which I have verified in my Ubuntu 14.04, but this API is linux-specific.

Others like libuv, libev, and libevent are also asynchronous API:

http://nikhilm.github.io/uvbook/filesystem.html#reading-writing-files

http://software.schmorp.de/pkg/libev.html

http://libevent.org/

All these API aimed to be portable across BSD, Linux, MacOSX, and even Windows.

In terms of performance I have not seen any numbers, but suspect libuv may be the fastest, due to its lightweightedness?

https://ghc.haskell.org/trac/ghc/ticket/8400




回答3:


(2019) If you're using a 5.1 or above kernel you can use the io_uring interface for file-like I/O and get excellent asynchronous operation.

Compared to the existing libaio/KAIO interface io_uring has the following advantages:

  • Works with buffered AND direct I/O
  • Easier to use
  • Can optionally work in a polled manner
  • Less bookkeeping space overhead per I/O
  • Lower CPU overhead due to less userspace/kernel syscall context switches (a big deal these days due to the impact of spectre/meltdown mitigations)
  • "Linked mode" that can be used to express dependencies between groups of I/Os (>=5.3 kernel)
  • Doesn't become blocking each time the stars aren't perfectly aligned

Compared to glibc's POSIX aio, io_uring has the following advantages:

  • Much faster and more efficient (the lower overhead benefits from above apply even moreso here)
  • Interface is kernel backed and DOESN'T use a userspace thread pool
  • zerocopy between userspace and the kernel, even when doing buffered I/O
  • glibc's POSIX aio can't have more than one I/O in flight on a single file descriptor whereas io_uring most certainly can!

The "Efficient IO with io_uring" document goes into far more detail as to io_uring's benefits and usage. There's also a "Faster IO through io_uring" videoed presentation by io_uring author Jens Axboe.

Re "support partial I/O in the sense of recv() vs read()": a patch went into the 5.3 kernel that will automatically retry io_uring short reads. A yet-to-land patch (which I guess will appear in the 5.4 kernel) tweaks the behaviour further and only automatically takes care of short reads when working with "regular" files when the request isn't REQ_F_NOWAIT (it looks like you can request REQ_F_NOWAIT via IOCB_NOWAIT or by opening the file with O_NONBLOCK). Thus it looks like you can get recv() style- "short" I/O behaviour from io_uring too.

Obviously at the time of writing the io_uring interface is very new but hopefully it will usher in a better asynchronous file-based I/O story for Linux.




回答4:


For network socket i/o, when it is "ready", it don't block. That's what the O_NONBLOCK and "ready" means.

For disk i/o, we have posix aio, linux aio, sendfile and friends.



来源:https://stackoverflow.com/questions/13407542/is-there-really-no-asynchronous-block-i-o-on-linux

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!