How to properly parse streaming JSON with Jackson?

…衆ロ難τιáo~ 提交于 2019-12-12 18:17:26

问题


I'm trying to figure out a clean way to parse streaming JSON with Jackson. "Streaming" as in TCP, off-the-wire, in a piecemeal fashion without any guarantee of receiving complete JSON data in a single read (no message framing either). Also, the goal is to do this asynchronously, which rules out relying on Jackson's handling of java.io.InputStreams. I came up with a functioning solution (see demonstration below), but I'm not particularly happy with it. Imperative style aside, I don't like the ungraceful handling of incomplete JSON by JsonParser#readValueAsTree. When processing a stream of bytes, incomplete data is absolutely normal and is not an exceptional scenario, so it's strange (and unacceptable) to see java.io.IOExceptions in Jackson's APIs. I also looked into using Jackson's TokenBuffer, but ran into similar issues. Is Jackson not really meant for processing true streaming JSON?

package com.example.jackson;

import com.fasterxml.jackson.core.JsonParser;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;

import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.util.Arrays;
import java.util.LinkedList;
import java.util.List;

import static java.nio.charset.StandardCharsets.UTF_8;
import static java.util.Collections.emptyList;

public class AsyncJsonParsing {
    public static void main(String[] args) {
        final AsyncJsonParsing parsing = new AsyncJsonParsing();

        parsing.runFirstScenario();
        parsing.runSecondScenario();
        parsing.runThirdScenario();
        parsing.runFourthScenario();
    }



    static final class ParsingOutcome {
        final List<JsonNode> roots;//list of parsed JSON objects and JSON arrays
        final byte[] remainder;

        ParsingOutcome(final List<JsonNode> roots, final byte[] remainder) {
            this.roots = roots;
            this.remainder = remainder;
        }
    }

    final byte[] firstMessage = "{\"message\":\"first\"}".getBytes(UTF_8);
    final byte[] secondMessage = "{\"message\":\"second\"}".getBytes(UTF_8);

    final byte[] leadingHalfOfFirstMessage = Arrays.copyOfRange(firstMessage, 0, firstMessage.length / 2);
    final byte[] trailingHalfOfFirstMessage = Arrays.copyOfRange(firstMessage, firstMessage.length / 2, firstMessage.length);

    final byte[] leadingHalfOfSecondMessage = Arrays.copyOfRange(secondMessage, 0, secondMessage.length / 2);
    final byte[] trailingHalfOfSecondMessage = Arrays.copyOfRange(secondMessage, secondMessage.length / 2, secondMessage.length);

    final ObjectMapper mapper = new ObjectMapper();

    void runFirstScenario() {
        //expectation: remainder = empty array and roots has a single element - parsed firstMessage
        final ParsingOutcome result = parse(firstMessage, mapper);
        report(result);
    }

    void runSecondScenario() {
        //expectation: remainder = leadingHalfOfFirstMessage and roots is empty
        final ParsingOutcome firstResult = parse(leadingHalfOfFirstMessage, mapper);
        report(firstResult);

        //expectation: remainder = empty array and roots has a single element - parsed firstMessage
        final ParsingOutcome secondResult = parse(concat(firstResult.remainder, trailingHalfOfFirstMessage), mapper);
        report(secondResult);
    }

    void runThirdScenario() {
        //expectation: remainder = leadingHalfOfSecondMessage and roots has a single element - parsed firstMessage
        final ParsingOutcome firstResult = parse(concat(firstMessage, leadingHalfOfSecondMessage), mapper);
        report(firstResult);

        //expectation: remainder = empty array and roots has a single element - parsed secondMessage
        final ParsingOutcome secondResult = parse(concat(firstResult.remainder, trailingHalfOfSecondMessage), mapper);
        report(secondResult);
    }

    void runFourthScenario() {
        //expectation: remainder = empty array and roots has two elements - parsed firstMessage, followed by parsed secondMessage
        final ParsingOutcome result = parse(concat(firstMessage, secondMessage), mapper);
        report(result);
    }

    static void report(final ParsingOutcome result) {
        System.out.printf("Remainder of length %d: %s%n", result.remainder.length, Arrays.toString(result.remainder));
        System.out.printf("Total of %d parsed JSON roots: %s%n", result.roots.size(), result.roots);
    }

    static byte[] concat(final byte[] left, final byte[] right) {
        final byte[] union = Arrays.copyOf(left, left.length + right.length);
        System.arraycopy(right, 0, union, left.length, right.length);
        return union;
    }

    static ParsingOutcome parse(final byte[] chunk, final ObjectMapper mapper) {
        final List<JsonNode> roots = new LinkedList<>();

        JsonParser parser;
        JsonNode root;
        try {
            parser = mapper.getFactory().createParser(chunk);
            root = parser.readValueAsTree();
        } catch (IOException e) {
            return new ParsingOutcome(emptyList(), chunk);
        }

        byte[] remainder = new byte[0];
        try {
            while(root != null) {
                roots.add(root);
                remainder = extractRemainder(parser);
                root = parser.readValueAsTree();
            }
        } catch (IOException e) {
            //fallthrough
        }

        return new ParsingOutcome(roots, remainder);
    }

    static byte[] extractRemainder(final JsonParser parser) {
        try {
            final ByteArrayOutputStream baos = new ByteArrayOutputStream();
            parser.releaseBuffered(baos);
            return baos.toByteArray();
        } catch (IOException e) {
            return new byte[0];
        }
    }
}

To elaborate a bit further, conceptually (at least in my mind), parsing of any streaming data boils down to a simple function which accepts an array of bytes and returns a tuple of (1) a possibly empty list of parsed results and (2) an array of remaining, currently-unparsable bytes. In the snippet above, this tuple is represented by an instance of ParsingOutcome.

来源:https://stackoverflow.com/questions/38416158/how-to-properly-parse-streaming-json-with-jackson

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!