An AJAX call is returning a response text that includes a JSON string. I need to:
For others who are looking (as I was) for extracting JSON strings from text in general (even if they're not valid), you could take a look at this Gulp plugin https://www.npmjs.com/package/gulp-extract-json-like. It searches for all strings which appear to be formatted like JSON strings.
Create a folder and install packages.
mkdir project && cd project
npm install gulp gulp-extract-json-like
Create a file ./gulpfile.js
and put following content into it:
var gulp = require('gulp');
var extractJsonLike = require('gulp-extract-json-like');
gulp.task('default', function () {
return gulp.src('file.txt')
.pipe(extractJsonLike())
.pipe(gulp.dest('dist'));
});
Create a file called ./file.txt
which contains your text and run the following command.
gulp
Found JSON strings will be in the ./dist/file.txt
.
You cannot use a regex to extract JSON from an arbitrary text. Since regexes are usually not powerful enough to validate JSON (unless you can use PCRE) they also cannot match it - if they could, they could also validate JSON.
However, if you know that the top-level element of your JSON is always an object or array, you can go by the following approach:
{
or [
) and last closing (}
or ]
) brace in your string.JSON.parse()
. If it succeeded, finish and return the parsed result.Here is a function that extracts a JSON object and returns the object and its position. If you really need top-level arrays, too, it should be to extend:
function extractJSON(str) {
var firstOpen, firstClose, candidate;
firstOpen = str.indexOf('{', firstOpen + 1);
do {
firstClose = str.lastIndexOf('}');
console.log('firstOpen: ' + firstOpen, 'firstClose: ' + firstClose);
if(firstClose <= firstOpen) {
return null;
}
do {
candidate = str.substring(firstOpen, firstClose + 1);
console.log('candidate: ' + candidate);
try {
var res = JSON.parse(candidate);
console.log('...found');
return [res, firstOpen, firstClose + 1];
}
catch(e) {
console.log('...failed');
}
firstClose = str.substr(0, firstClose).lastIndexOf('}');
} while(firstClose > firstOpen);
firstOpen = str.indexOf('{', firstOpen + 1);
} while(firstOpen != -1);
}
var obj = {'foo': 'bar', xxx: '} me[ow]'};
var str = 'blah blah { not {json but here is json: ' + JSON.stringify(obj) + ' and here we have stuff that is } really } not ] json }} at all';
var result = extractJSON(str);
console.log('extracted object:', result[0]);
console.log('expected object :', obj);
console.log('did it work ?', JSON.stringify(result[0]) == JSON.stringify(obj) ? 'yes!' : 'no');
console.log('surrounding str :', str.substr(0, result[1]) + '<JSON>' + str.substr(result[2]));
Demo (executed in the nodejs environment, but should work in a browser, too): https://paste.aeum.net/show/81/
If the JSON is returned as part of an ajax response, why not use the browsers native JSON parsing (beware of gotchas)? Or jQuery JSON Parsing?
If the JSON is totally mangled up with the text, that really reeks of a design issue IMHO - if you can change it, I would strongly recommend doing so (i.e. return a single JSON object as the response, with the text as a property of the object).
If not, then using RegEx is going to be an absolute nightmare. JSON is naturally very flexible, and ensuring accurate parsing is going to be not only time-consuming, but just wasteful. I would probably put in content markers at the start/end and hope for the best. But you're going to be wide-open to validation errors etc.