Extract JSON from text

后端 未结 3 1121
猫巷女王i
猫巷女王i 2020-12-11 16:00

An AJAX call is returning a response text that includes a JSON string. I need to:

  1. extract the JSON string
  2. modify it
  3. then reinsert it to updat
相关标签:
3条回答
  • 2020-12-11 16:15

    For others who are looking (as I was) for extracting JSON strings from text in general (even if they're not valid), you could take a look at this Gulp plugin https://www.npmjs.com/package/gulp-extract-json-like. It searches for all strings which appear to be formatted like JSON strings.

    Create a folder and install packages.

    mkdir project && cd project
    npm install gulp gulp-extract-json-like
    

    Create a file ./gulpfile.js and put following content into it:

    var gulp = require('gulp');
    var extractJsonLike = require('gulp-extract-json-like');
    
    gulp.task('default', function () {
      return gulp.src('file.txt')
        .pipe(extractJsonLike())
        .pipe(gulp.dest('dist'));
    });
    

    Create a file called ./file.txt which contains your text and run the following command.

    gulp
    

    Found JSON strings will be in the ./dist/file.txt.

    0 讨论(0)
  • 2020-12-11 16:30

    You cannot use a regex to extract JSON from an arbitrary text. Since regexes are usually not powerful enough to validate JSON (unless you can use PCRE) they also cannot match it - if they could, they could also validate JSON.

    However, if you know that the top-level element of your JSON is always an object or array, you can go by the following approach:

    • Find the first opening ({ or [) and last closing (} or ]) brace in your string.
    • Try to parse that block of text (including the braces) using JSON.parse(). If it succeeded, finish and return the parsed result.
    • Take the previous closing brace and try parsing that string. If it succeeds, you are done again.
    • Repeat this until you got no brace or one that comes before the current opening brace.
    • Find the first opening brace after the one from step 1. If you did not find any, the string did not contain a JSON object/array and you can stop.
    • Go to step 2.

    Here is a function that extracts a JSON object and returns the object and its position. If you really need top-level arrays, too, it should be to extend:

    function extractJSON(str) {
        var firstOpen, firstClose, candidate;
        firstOpen = str.indexOf('{', firstOpen + 1);
        do {
            firstClose = str.lastIndexOf('}');
            console.log('firstOpen: ' + firstOpen, 'firstClose: ' + firstClose);
            if(firstClose <= firstOpen) {
                return null;
            }
            do {
                candidate = str.substring(firstOpen, firstClose + 1);
                console.log('candidate: ' + candidate);
                try {
                    var res = JSON.parse(candidate);
                    console.log('...found');
                    return [res, firstOpen, firstClose + 1];
                }
                catch(e) {
                    console.log('...failed');
                }
                firstClose = str.substr(0, firstClose).lastIndexOf('}');
            } while(firstClose > firstOpen);
            firstOpen = str.indexOf('{', firstOpen + 1);
        } while(firstOpen != -1);
    }
    
    var obj = {'foo': 'bar', xxx: '} me[ow]'};
    var str = 'blah blah { not {json but here is json: ' + JSON.stringify(obj) + ' and here we have stuff that is } really } not ] json }} at all';
    var result = extractJSON(str);
    console.log('extracted object:', result[0]);
    console.log('expected object :', obj);
    console.log('did it work     ?', JSON.stringify(result[0]) == JSON.stringify(obj) ? 'yes!' : 'no');
    console.log('surrounding str :', str.substr(0, result[1]) + '<JSON>' + str.substr(result[2]));
    

    Demo (executed in the nodejs environment, but should work in a browser, too): https://paste.aeum.net/show/81/

    0 讨论(0)
  • 2020-12-11 16:31

    If the JSON is returned as part of an ajax response, why not use the browsers native JSON parsing (beware of gotchas)? Or jQuery JSON Parsing?

    If the JSON is totally mangled up with the text, that really reeks of a design issue IMHO - if you can change it, I would strongly recommend doing so (i.e. return a single JSON object as the response, with the text as a property of the object).

    If not, then using RegEx is going to be an absolute nightmare. JSON is naturally very flexible, and ensuring accurate parsing is going to be not only time-consuming, but just wasteful. I would probably put in content markers at the start/end and hope for the best. But you're going to be wide-open to validation errors etc.

    0 讨论(0)
提交回复
热议问题