Chunked Parsing with FParsec

前端 未结 1 1889
甜味超标
甜味超标 2020-12-19 11:51

Is it possible to submit input to an FParsec parser in chunks, as from a socket? If not, is it possible to retrieve the current result and unparsed portion of an input strea

1条回答
  •  伪装坚强ぢ
    2020-12-19 12:50

    The normal version of FParsec (though not the Low-Trust version) reads the input chunk-wise, or "block-wise", as I call it in the CharStream documentation. Thus, if you construct a CharStream from a System.IO.Stream and the content is large enough to span multiple CharStream blocks, you can start parsing before you've fully retrieved the input.

    Note however, that the CharStream will consume the input stream in chunks of a fixed (but configurable) size, i.e. it will call the Read method of the System.IO.Stream as often as is necessary to fill a complete block. Hence, if you parse the input faster than you can retrieve new input, the CharStream may block even though there is already some unparsed input, because there's not yet enough input to fill a complete block.

    Update

    The answer(s) to your ultimate questions: 42.

    • How you implement the Stream from which you construct the CharStream is entirely up to you. The restriction you're remembering that excludes parallel access only applies to the CharStream class, which isn't thread safe.

    • Implementing the Stream as a circular buffer will likely restrict the maximum distance over which you can backtrack.

    • The block size of the CharStream influences how far you can backtrack when the Stream does not support seeking.

    • The simplest way to parse input asynchronously is to do the parsing in an async task (i.e. on a background thread). In the task you could simply read the socket synchronously, or, if you don't trust the buffering by the OS, you could use a stream class like the BlockingStream described in the article you linked in the second comment below.

    • If the input can be easily separated into independent chunks (e.g. lines for a line-based text format), it might be more efficient to chunk it up yourself and then parse the input chunk by chunk.

    0 讨论(0)
提交回复
热议问题