Split a list of JSON blobs delimited by commas (ignoring commas inside a JSON blob) [duplicate]

烈酒焚心 提交于 2020-03-28 06:55:56

问题


Here's a weird one. I'm given an ill-conceived input string that is a list of JSON blobs, separated commas. e.g.:

string input = "{<some JSON object>},{JSON_2},{JSON_3},...,{JSON_n}"

And I have to convert this to an actual list of JSON strings (List<string>).

For context, the unsanitary "input" list of JSONs is read in directly from a .txt file on disk, produced by some other software. I'm writing an "adapter" to allow this data to be consumed by another piece of software that knows how to interpret the individual JSON objects contained within the list. Ideally, the original software could have output one file per JSON object.


The "obvious" solution (using String.Split):

List<string> split = input.Split(',').ToList();

would of course fail to escape commas present within the JSON objects ({}) themselves


I was considering a manual approach - walking the string character-by-character and only splitting out a new element if the count of { is equal to the count of }. Something like:

List<string> JsonBlobs = new List<string>();
int start = 0, nestingLevel = 0;
for (int i = 0; i < input.Length; i++)
{
    if (input[i] == '{') nestingLevel++;
    else if (input[i] == '}') nestingLevel--;
    else if (input[i] == ',' && nestingLevel == 0)
    {
        JsonBlobs.Add(input.Substring(start, i - start));
        start = i + 1;
    }
}

(The above likely contains bugs)


I had also considered adding JSON array braces on either end of the string ([]) and letting a JSON serializer deserialize it as a JSON array, then re-serialize each of the array elements one at a time:

List<string> JsonBlobs = Newtonsoft.Json.Linq.JArray.Parse("[" + input + "]")
    .Select<Newtonsoft.Json.Linq.JToken, string>(token => token.ToString()).ToList();

But this seems overly-expensive, and could potentially result in newly serialized JSON representations that are not exactly equal to the original string contents.


Any better suggestions?

I'd prefer to use some easily-understandable use of built-in libraries and/or LINQ if possible. Regex would be a last resort, although nifty regex solutions would also be interesting to see.


回答1:


Trying to parse this out using your own rules is fraught. You noticed the problem where JSON properties are comma-separated, but also bear in mind that JSON values can include strings, which could contain braces and commas, and even quote characters that have nothing to do with the JSON structure.

{"John's comment": "I was all like, \"no way!\" :-}"}

To do it right, you're going to need to write a parser capable of handling all the JSON rules. You're likely to make mistakes, and unlikely to get much value out of the effort you put into it.

I would personally suggest the approach of adding brackets on either side of the string and deserializing the whole thing as a JSON array.

I'd also suggest questioning the requirement to convert the result to a list of strings: Was that requirement based on someone's assumption that producing a list of strings would be simpler than producing a list of JObjects or a list of some specific serialized type?




回答2:


You can try splitting on:

(?<=}),(?={)

but this of course assumes that a JSON string does not literally contain a sequence of },{ such as:

{"key":"For whatever reason, },{ literally exists in this string"}

it would also fail for an array of objects such as:

{"key1":[{"key2":"value2"},{"key3":"value3"}]}

:-/



来源:https://stackoverflow.com/questions/60287376/split-a-list-of-json-blobs-delimited-by-commas-ignoring-commas-inside-a-json-bl

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!