How do I read the output of a child process without blocking in Rust?

后端 未结 2 2018
生来不讨喜
生来不讨喜 2020-11-27 19:39

I\'m making a small ncurses application in Rust that needs to communicate with a child process. I already have a prototype written in Common Lisp. I\'m trying to rewrite it

相关标签:
2条回答
  • 2020-11-27 20:18

    Tokio's Command

    Here is an example of using tokio 0.2:

    use std::process::Stdio;
    use futures::StreamExt; // 0.3.1
    use tokio::{io::BufReader, prelude::*, process::Command}; // 0.2.4, features = ["full"]
    
    #[tokio::main]
    async fn main() {
        let mut cmd = Command::new("/tmp/slow.bash")
            .stdout(Stdio::piped()) // Can do the same for stderr
            .spawn()
            .expect("cannot spawn");
    
        let stdout = cmd.stdout().take().expect("no stdout");
        // Can do the same for stderr
    
        // To print out each line
        // BufReader::new(stdout)
        //     .lines()
        //     .for_each(|s| async move { println!("> {:?}", s) })
        //     .await;
    
        // To print out each line *and* collect it all into a Vec
        let result: Vec<_> = BufReader::new(stdout)
            .lines()
            .inspect(|s| println!("> {:?}", s))
            .collect()
            .await;
    
        println!("All the lines: {:?}", result);
    }
    

    Tokio-Threadpool

    Here is an example of using tokio 0.1 and tokio-threadpool. We start the process in a thread using the blocking function. We convert that to a stream with stream::poll_fn

    use std::process::{Command, Stdio};
    use tokio::{prelude::*, runtime::Runtime}; // 0.1.18
    use tokio_threadpool; // 0.1.13
    
    fn stream_command_output(
        mut command: Command,
    ) -> impl Stream<Item = Vec<u8>, Error = tokio_threadpool::BlockingError> {
        // Ensure that the output is available to read from and start the process
        let mut child = command
            .stdout(Stdio::piped())
            .spawn()
            .expect("cannot spawn");
        let mut stdout = child.stdout.take().expect("no stdout");
    
        // Create a stream of data
        stream::poll_fn(move || {
            // Perform blocking IO
            tokio_threadpool::blocking(|| {
                // Allocate some space to store anything read
                let mut data = vec![0; 128];
                // Read 1-128 bytes of data
                let n_bytes_read = stdout.read(&mut data).expect("cannot read");
    
                if n_bytes_read == 0 {
                    // Stdout is done
                    None
                } else {
                    // Only return as many bytes as we read
                    data.truncate(n_bytes_read);
                    Some(data)
                }
            })
        })
    }
    
    fn main() {
        let output_stream = stream_command_output(Command::new("/tmp/slow.bash"));
    
        let mut runtime = Runtime::new().expect("Unable to start the runtime");
    
        let result = runtime.block_on({
            output_stream
                .map(|d| String::from_utf8(d).expect("Not UTF-8"))
                .fold(Vec::new(), |mut v, s| {
                    print!("> {}", s);
                    v.push(s);
                    Ok(v)
                })
        });
    
        println!("All the lines: {:?}", result);
    }
    

    There's numerous possible tradeoffs that can be made here. For example, always allocating 128 bytes isn't ideal, but it's simple to implement.

    Support

    For reference, here's slow.bash:

    #!/usr/bin/env bash
    
    set -eu
    
    val=0
    
    while [[ $val -lt 10 ]]; do
        echo $val
        val=$(($val + 1))
        sleep 1
    done
    

    See also:

    • How do I synchronously return a value calculated in an asynchronous Future in stable Rust?
    0 讨论(0)
  • 2020-11-27 20:23

    Streams are blocking by default. TCP/IP streams, filesystem streams, pipe streams, they are all blocking. When you tell a stream to give you a chunk of bytes it will stop and wait till it has the given amout of bytes or till something else happens (an interrupt, an end of stream, an error).

    The operating systems are eager to return the data to the reading process, so if all you want is to wait for the next line and handle it as soon as it comes in then the method suggested by Shepmaster in Unable to pipe to or from spawned child process more than once (and also in his answer here) works.
    Though in theory it doesn't have to work, because an operating system is allowed to make the BufReader wait for more data in read, but in practice the operating systems prefer the early "short reads" to waiting.

    This simple BufReader-based approach becomes even more dangerous when you need to handle multiple streams (like the stdout and stderr of a child process) or multiple processes. For example, BufReader-based approach might deadlock when a child process waits for you to drain its stderr pipe while your process is blocked waiting on it's empty stdout.

    Similarly, you can't use BufReader when you don't want your program to wait on the child process indefinitely. Maybe you want to display a progress bar or a timer while the child is still working and gives you no output.

    You can't use BufReader-based approach if your operating system happens not to be eager in returning the data to the process (prefers "full reads" to "short reads") because in that case a few last lines printed by the child process might end up in a gray zone: the operating system got them, but they're not large enough to fill the BufReader's buffer.

    BufReader is limited to what the Read interface allows it to do with the stream, it's no less blocking than the underlying stream is. In order to be efficient it will read the input in chunks, telling the operating system to fill as much of its buffer as it has available.

    You might be wondering why reading data in chunks is so important here, why can't the BufReader just read the data byte by byte. The problem is that to read the data from a stream we need the operating system's help. On the other hand, we are not the operating system, we work isolated from it, so as not to mess with it if something goes wrong with our process. So in order to call to the operating system there needs to be a transition to "kernel mode" which might also incur a "context switch". That is why calling the operating system to read every single byte is expensive. We want as few OS calls as possible and so we get the stream data in batches.

    To wait on a stream without blocking you'd need a non-blocking stream. MIO promises to have the required non-blocking stream support for pipes, most probably with PipeReader, but I haven't checked it out so far.

    The non-blocking nature of a stream should make it possible to read data in chunks regardless of whether the operating system prefers the "short reads" or not. Because non-blocking stream never blocks. If there is no data in the stream it simply tells you so.

    In the absense of a non-blocking stream you'll have to resort to spawning threads so that the blocking reads would be performed in a separate thread and thus won't block your primary thread. You might also want to read the stream byte by byte in order to react to the line separator immediately in case the operating system does not prefer the "short reads". Here's a working example: https://gist.github.com/ArtemGr/db40ae04b431a95f2b78.

    P.S. Here's an example of a function that allows to monitor the standard output of a program via a shared vector of bytes:

    use std::io::Read;
    use std::process::{Command, Stdio};
    use std::sync::{Arc, Mutex};
    use std::thread;
    
    /// Pipe streams are blocking, we need separate threads to monitor them without blocking the primary thread.
    fn child_stream_to_vec<R>(mut stream: R) -> Arc<Mutex<Vec<u8>>>
    where
        R: Read + Send + 'static,
    {
        let out = Arc::new(Mutex::new(Vec::new()));
        let vec = out.clone();
        thread::Builder::new()
            .name("child_stream_to_vec".into())
            .spawn(move || loop {
                let mut buf = [0];
                match stream.read(&mut buf) {
                    Err(err) => {
                        println!("{}] Error reading from stream: {}", line!(), err);
                        break;
                    }
                    Ok(got) => {
                        if got == 0 {
                            break;
                        } else if got == 1 {
                            vec.lock().expect("!lock").push(buf[0])
                        } else {
                            println!("{}] Unexpected number of bytes: {}", line!(), got);
                            break;
                        }
                    }
                }
            })
            .expect("!thread");
        out
    }
    
    fn main() {
        let mut cat = Command::new("cat")
            .stdin(Stdio::piped())
            .stdout(Stdio::piped())
            .stderr(Stdio::piped())
            .spawn()
            .expect("!cat");
    
        let out = child_stream_to_vec(cat.stdout.take().expect("!stdout"));
        let err = child_stream_to_vec(cat.stderr.take().expect("!stderr"));
        let mut stdin = match cat.stdin.take() {
            Some(stdin) => stdin,
            None => panic!("!stdin"),
        };
    }
    

    With a couple of helpers I'm using it to control an SSH session:

    try_s! (stdin.write_all (b"echo hello world\n"));
    try_s! (wait_forˢ (&out, 0.1, 9., |s| s == "hello world\n"));
    

    P.S. Note that await on a read call in async-std is blocking as well. It's just instead of blocking a system thread it only blocks a chain of futures (a stack-less green thread essentially). The poll_read is the non-blocking interface. In async-std#499 I've asked the developers whether there's a short read guarantee from these APIs.

    0 讨论(0)
提交回复
热议问题