How to perform non-blocking reading stdout from a subprocess in clojure?

问题

I wish to spawn a long-running sub-process from clojure and communicate with this process via the standard streams.

Using the conch library, I can spawn and read the process, and read data from the out stream:

(def my-process (sh/proc "my_dumb_process"))
  ; read 10 lines from my-process's stdout. Will block until 10 lines taken
  (take 10 (line-seq (clojure.java.io/reader (:out p))))

I want to invoke an asynchronous callback whenever my-process prints to stdout - whenever data is available in the stdout stream.

I'm a bit new to clojure - is there an idiomatic clojur-ey way to do this? I've looked through core.async which is nice but I can't find a non-blocking solution for streams.

回答1:

A sample shell script for our purposes (be sure to make it executable), place it in the root of your clojure project for easy testing:

$ cat dumb.sh
#!/bin/bash

for i in 1 2 3 4 5
do
    echo "Loop iteration $i"
    sleep 2
done

Now we will define the process to execute, start it, and get stdout ((.getInputStream process)), read one line at a time and loop until we're done. Reads in real time.

(defn run-proc
  [proc-name arg-string callback]
  (let [pbuilder (ProcessBuilder. (into-array String [proc-name arg-string]))
        process (.start pbuilder)]
    (with-open [reader (clojure.java.io/reader (.getInputStream process))]
      (loop []
        (when-let [line (.readLine ^java.io.BufferedReader reader)]
          (callback line)
          (recur))))))

To test:

(run-proc "./dumb.sh" "" println)
About to start...
Loop iteration 1
Loop iteration 2
Loop iteration 3
Loop iteration 4
Loop iteration 5
=> nil

This function will block, as will the call to your callback; you can wrap in a future if you want it to run in a separate thread:

(future (callback line))

For a core.async-based approach:

(defn run-proc-async
  [proc-name arg-string callback]
  (let [ch (async/chan 1000 (map callback))]
    (async/thread
      (let [pbuilder (ProcessBuilder. (into-array String [proc-name arg-string]))
            process (.start pbuilder)]
        (with-open [reader (clojure.java.io/reader (.getInputStream process))]
          (loop []
            (when-let [line (.readLine ^java.io.BufferedReader reader)]
              (async/>!! ch line)
              (recur))))))
    ch))

This applies your callback function as a transducer onto the channel, with the result being placed on the channel which the function returns:

(run-proc-async "./dumb.sh" "" #(let [cnt (count %)]
                                  (println "Counted" cnt "characters")
                                  cnt))

#object[clojure.core.async.impl.channels.ManyToManyChannel ...]
Counted 16 characters
Counted 16 characters
Counted 16 characters
Counted 16 characters
Counted 16 characters

(async/<!! *1)
=> 16

In this example there is a buffer of 1000 on the channel. So, unless you begin to take from the channel, calls to >!! will block after 1000 lines are read. You could alternatively use put! with a callback, but there is a built-in 1024 limit here, and you should be processing the result anyway.

回答2:

If you don't mind using a library, you can find a simple solution using lazy-gen and yield from the Tupelo library. It works like generator functions in Python:

(ns tst.demo.core
  (:use demo.core tupelo.test)
  (:require
    [clojure.java.io :as io]
    [tupelo.core :as t]
    [me.raynes.conch.low-level :as cll]
  ))
(t/refer-tupelo)

(dotest
  (let [proc          (cll/proc "dumb.sh")
        >>            (pretty proc)
        out-lines     (line-seq (io/reader (grab :out proc)))
        lazy-line-seq (lazy-gen
                        (doseq [line out-lines]
                          (yield line))) ]
    (doseq [curr-line lazy-line-seq]
      (spyx curr-line))))

Using the same dumb.sh as before, it yields this output:

{:out  #object[java.lang.UNIXProcess$ProcessPipeInputStream 0x465b16bb "java.lang.UNIXProcess$ProcessPipeInputStream@465b16bb"],
 :in   #object[java.lang.UNIXProcess$ProcessPipeOutputStream 0xfafbc63 "java.lang.UNIXProcess$ProcessPipeOutputStream@fafbc63"],
 :err  #object[java.lang.UNIXProcess$ProcessPipeInputStream 0x59bb8f80 "java.lang.UNIXProcess$ProcessPipeInputStream@59bb8f80"],
 :process  #object[java.lang.UNIXProcess 0x553c74cc "java.lang.UNIXProcess@553c74cc"]}

; one of these is printed every 2 seconds
curr-line => "Loop iteration 1"
curr-line => "Loop iteration 2"
curr-line => "Loop iteration 3"
curr-line => "Loop iteration 4"
curr-line => "Loop iteration 5"

Everything in the lazy-gen is run in a separate thread using core.async. The doseq eagerly consumes the process output and places it on the output lazy sequence using yield. The 2nd doseq eagerly consumes the result of lazy-gen in the current thread and prints each line as soon as it is available.

Alternate solution:

An even simpler solution is to simply use a future like so:

(dotest
  (let [proc          (cll/proc "dumb.sh")
        out-lines     (line-seq (io/reader (grab :out proc))) ]
    (future
      (doseq [curr-line out-lines]
        (spyx curr-line)))))

with the same results:

curr-line => "Loop iteration 1"
curr-line => "Loop iteration 2"
curr-line => "Loop iteration 3"
curr-line => "Loop iteration 4"
curr-line => "Loop iteration 5"

来源：https://stackoverflow.com/questions/45292625/how-to-perform-non-blocking-reading-stdout-from-a-subprocess-in-clojure

标签

multithreading

asynchronous

clojure

process

ipc