I have some JavaScript code, from which I need to find start+end indexes of every literal regular expression.
How can such information be extracted from UglifyJS?
Since nobody has answered yet, I have managed to come up with a head-on solution that works, though perhaps not the best one.
function enumRegEx(parsed) {
var result = [];
function loop(obj) {
if (obj && typeof obj === 'object') {
if (obj.used) {
return;
} else {
obj.used = true;
}
if (obj instanceof Array) {
obj.forEach(function (d) {
loop(d);
});
} else {
if (obj instanceof uglify.AST_Node) {
for (var v in obj) {
loop(obj[v]);
}
} else {
if (obj instanceof uglify.AST_Token) {
if (obj.type === 'regexp') {
result.push({
startIdx: obj.col,
endIdx: obj.endcol
});
}
}
}
}
}
}
loop(parsed);
return result;
}
The things I don't like about such approach:
I'm using it against a huge, 30,000 lines JavaScript file, which gets parsed by UglifyJS in 240ms, and then my algorithm takes another 430ms just to enumerate regular expressions. This seems quite inefficient.
I have to modify the original objects with property used
because the parsed structure uses mutual references, which otherwise results in infinite loops and running out of call stack. Although I'm not worried about that very much, since I'm not using the parsed data for anything else.
If you know a better approach - please, throw it in! At this point I'm mostly interested in improving the performance of my enumeration, which is currently quite slow, compared to the actual parsing that is.