JLine the contract for NonBlockingReader seems broken

问题

Follows on from my previous question about JLine. OS: W10, using Cygwin.

def terminal = org.jline.terminal.TerminalBuilder.builder().jna( true ).system( true ).build()
terminal.enterRawMode()
// NB the Terminal I get is class org.jline.terminal.impl.PosixSysTerminal
def reader = terminal.reader()
// class org.jline.utils.NonBlocking$NonBlockingInputStreamReader

def bytes = [] // NB class ArrayList
int readInt = -1
while( readInt != 13 && readInt != 10 ) {
    readInt = reader.read()
    byte convertedByte = (byte)readInt
    // see what the binary looks like:
    String binaryString = String.format("%8s", Integer.toBinaryString( convertedByte & 0xFF)).replace(' ', '0')
    println "binary |$binaryString|"
    bytes << (byte)readInt // NB means "append to list"

    // these seem to block forever, whatever the param... 
    // int peek = reader.peek( 50 ) 
    int peek = reader.peek( 0 )

}
// strip final byte (13 or 10)
bytes = bytes[0..-2]
def response = new String( (byte[])bytes.toArray(), 'UTF-8' )

According to the Javadoc (made locally from the source) peek looks like this:

public int peek(long timeout)

Peeks to see if there is a byte waiting in the input stream without actually consuming the byte.

Parameters: timeout - The amount of time to wait, 0 == forever Returns: -1 on eof, -2 if the timeout expired with no available input or the character that was read (without consuming it).

It doesn't say what time units are involved here... I assume milliseconds, but I also tried with "1", just in case it's seconds.

This peek command is sufficiently functional as it stands for you to be able to detect multi-byte Unicode input, with a bit of time-out ingenuity: one presumes the bytes of a multi-byte Unicode character will arrive faster than a person can type...

However, if it never unblocks this means that you have to put the peek command inside a time-out mechanism which you have to roll yourself. The next character input will of course unblock things. If this is an Enter the while loop will then end. But if, say, you wanted to print a character (or do anything) before the next character is input the fact that peek's timeout doesn't appear to work prevents you doing that.

回答1:

Try playing with

 jshell> "𐐷 ẃ".getBytes()
 $1 ==> byte[8] { -16, -112, -112, -73, 32, -31, -70, -125 }

 jshell> "𐐷 ẃ".chars().toArray()
 $2 ==> int[4] { 55297, 56375, 32, 7811 }

 jshell> "𐐷 ẃ".codePoints() .toArray()
 $3 ==> int[3] { 66615, 32, 7811 }

回答2:

JLine uses the usual java semantics: streams get bytes, reader/writer uses chars. The only piece that deals with codepoints (i.e. possible 32 bits characters in a single value) is the BindingReader. The NonBlockingReader follows the Reader semantic, simply adding some methods with a timeout that can return -2 to indicate a timeout.

If you want to do the decoding, you need to use Character.isHighSurrogate method as done by the BindingReader https://github.com/jline/jline3/blob/master/reader/src/main/java/org/jline/keymap/BindingReader.java#L124-L144

int s = 0;
int c = c = reader.read(100L);
if (c >= 0 && Character.isHighSurrogate((char) c)) {
    s = c;
    c = reader.read(100L);
}
return s != 0 ? Character.toCodePoint((char) s, (char) c) : c;

回答3:

I have found a Cywin-specific solution to this... and also whay may be (?) the only way to intercept, isolate and identify "keyboard control" character input.

Getting correct Unicode input using JLine and Cygwin
As referenced here in my own answer to a question I asked a year ago, Cygwin (in my setup anyway) needs some sort of extra buffering and encoding, both for console input and output, if it is to handle Unicode properly.

To apply this AND to apply JLine at the same time, I do this, after going terminal.enterRawMode():

BufferedReader br = new BufferedReader( new InputStreamReader( terminal.input(), 'UTF-8' ))

NB terminal.input() returns an org.jline.utils.NonBlockingInputStream instance.

entering "ẃ" (AltGr + W in a UK Extd Keyboard) is then consumed in one br.read() command, and the int value produced is 7811, the correct codepoint value. Hurrah: a Unicode character not in the BMP (Basic Multilingual Plane) has been correctly consumed.

Handling keyboard control character bytes:
But I also want to intercept, isolate and correctly identify bytes corresponding to various control characters. TAB is one-byte (9), BACKSPACE is one-byte (127), so easy to deal with, but UP-ARROW is delivered in the form of 3 separately-read bytes, i.e. three separate br.read() commands are unblocked, even using the above BufferedReader. Some control sequences contain 7 such bytes, e.g. Ctrl-Shift-F5 is 27 (escape) followed by 6 other separately read bytes, int values: 91, 49, 53, 59, 54, 126. I haven't yet found where such sequences may be documented: if anyone knows please add a comment.

It is then necessary to isolate these "grouped bytes": i.e. you have a stream of bytes: how do you know that these 3 (or 7...) have to be interpreted jointly?

This is possible by taking advantage of the fact that when multiple bytes are delivered for a single such control character they are delivered with less that one millisecond between each. Not that surprisingly perhaps. This Groovy script seems to work for my purposes:

import org.apache.commons.lang3.StringUtils
@Grab(group='org.jline', module='jline', version='3.7.0')
@Grab(group='org.apache.commons', module='commons-lang3', version='3.7')
def terminal = org.jline.terminal.TerminalBuilder.builder().jna( true ).system( true ).build()

terminal.enterRawMode()
// BufferedReader needed for correct Unicode input using Cygwin
BufferedReader br = new BufferedReader( new InputStreamReader(terminal.input(), 'UTF-8' ))
// PrintStream needed for correct Unicode output using Cygwin
outPS = new PrintStream(System.out, true, 'UTF-8' )
userResponse = ''
int readInt
boolean continueLoop = true

while( continueLoop ) {
    readInt = br.read()
    while( readInt == 27 ) {
        println "escape"
        long startNano = System.nanoTime()
        long nanoDiff = 0
        // figure of 500000 nanoseconds arrived at by experimentation: see below
        while( nanoDiff < 500000 ) {
            readInt = br.read()  
            long timeNow = System.nanoTime()
            nanoDiff = timeNow - startNano
            println "z readInt $readInt char ${(char)readInt} nanoDiff $nanoDiff"
            startNano = timeNow
        }
    }
    switch( readInt ) {
        case [10, 13]:
            println ''
            continueLoop = false
            break
        case 9:
            println '...TAB'
            continueLoop = false
            break
        case 127:
            // backspace
            if( ! userResponse.empty ) {
                print '\b \b'
                // chop off last character
                userResponse = StringUtils.chop( userResponse )
            }
            break
        default:
            char unicodeChar = (char)readInt
            outPS.print( unicodeChar )
            userResponse += unicodeChar
    }
}
outPS.print( "userResponse |$userResponse|")
br.close()
terminal.close()

The above code enables me to successfully "isolate" the individual multi-byte keyboard control characters:

The 3 dots in the println "...TAB" line are printed on the same line, immediately after the user has pressed TAB (which with the above code is not printed on the input line). This opens the door to doing things like "autocompletion" of lines as in certain BASH commands...

Is this setting of 500000 nanoseconds (0.5 ms) fast enough? Maybe!

The fastest typists can type at 220 words per minute. Assuming an average characters per word of 8 (which seems high) this works out at 29 characters per second, or approximately 34 ms per character. In theory things should be OK. But a "rogue" pressing of two keys simultaneously might possibly mean they are pressed in less than 0.5 ms between each other... however, with the above code this only matters if both of these are escape sequences. It seems to work OK. It can't really be much less than 500000 ns according to my experiments because it can take up to 70000 - 80000 ns between each byte in a multi-byte sequence (although usually takes less)... and all sorts of interrupts or funny things happening might of course interfere with delivery of these bytes. In fact setting it to 1000000 (1 ms) seems to work fine.

NB we now seem to have a problem with the above code if we want to intercept and deal with escape sequences: the above code blocks on br.read() inside the nanoDiff while loop at the end of the escape sequence. This is OK though because we can track the bytes sequence we are receiving as that while loop happens (before it blocks).

来源：https://stackoverflow.com/questions/50016620/jline-the-contract-for-nonblockingreader-seems-broken

标签

java

peek

jline3