I have been trying to port invRegex.py to a node.js implementation for a while, but I\'m still struggling with it. I already have the regular expression parse tree thanks to
It sounds like you're asking for Lazy Cartesian Product: you do want the Cartesian product, but you don't want to compute them all beforehand (and consume all that memory). Said another way, you want to iterate through the Cartesian product.
If that's right, have you checked out this Javascript implementation of the X(n) formula? With that, you can either iterate over them in natural order <<0,0,0>, <0,0,1>, <0,1,0>, ...> or choose an arbitrary position to calculate.
It seems like you can just do:
// move through them in natural order, without precomputation
lazyProduct(tokens, function (token) { console.log(token); });
// or...
// pick ones out at random
var LP = new LazyProduct(tokens);
console.log(LP.item(7)); // the 7th from the list, without precompute
console.log(LP.item(242)); // the 242nd from the list, again without precompute
Surely I must be missing something...? Generators are simply overkill given the X(n) formula.
Update
Into a JSFiddle I have placed an instrumented version of the lazyProduct
code, a sample tokens array of arrays, and a call to lazyProduct
with those tokens
.
When you run the code without modification, you'll see it generates the 0@a
, etc. output expected from the sample tokens
array. I think the link explains the logic pretty well, but in summary... if you uncomment the instrumentation in lazyProduct
, you'll notice that there are two key variables lens
and p
. lens
is a pre-compute of the length of each array (in the array of arrays) passed in. p
is a stack that holds the current path up to where you are (eg, if you're "1st array 3rd index, 2nd array 2nd index, and 3rd array 1st index" p
represents that), and this is what's passed into your callback function.
My callback function just does a join on the arguments (per your OP example), but again these are just the corresponding values in p mapped to the original array of arrays.
If you profile this further, you'll see the footprint needed to build the Cartesian product is limited to just what you need to call your callback function. Try it on one of your worst case tokens to see.
Update 2
I coded about 75% of an approach based on Cartesian product. My API took a ret.js parse tree, converted that to RPN, then generated sets of sets to pass into a X(n) calculator. Using @ruud example of ([ab]{2}){1,2}|[cd](f|ef{0,2}e)
, this would be generated:
new Option([
new Set([
new Set(new Set(['a','b']), new Set['a','b'])),
new Set(new Set(['a','b']), new Set['a','b']))
]),
new Set(
['c', 'd'],
new Option([
new Set(['f']),
new Set(['e']]),
new Set(['e','f']),
new Set(new Set(['e','f']), new Set(['e', 'f']))
])
])
The tricky parts were nested options (buggy) and inverse character classes & back-references (unsupported).
This approach was getting brittle, and really the Iterator solution is superior. Converting from your parse tree into that should be pretty straightforward. Thanks for the interesting problem!