This is better explained with an example. I want to achieve an split like this:
two-separate-tokens-this--is--just--one--token-another
->
str.match(/(?!-)(.*?[^\-])(?=(?:-(?!-)|$))/g);
Check this fiddle.
Explanation:
Non-greedy pattern (?!-)(.*?[^\-])
match a string that does not start and does not end with dash character and pattern (?=(?:-(?!-)|$))
requires such match to be followed by single dash character or by end of line. Modifier /g
forces function match
to find all occurrences, not just a single (first) one.
Edit (based on OP's comment):
str.match(/(?:[^\-]|--)+/g);
Check this fiddle.
Explanation:
Pattern (?:[^\-]|--)
will match non-dash character or double-dash string. Sign +
says that such matching from the previous pattern should be multiplied as many times as can. Modifier /g
forces function match
to find all occurrences, not just a single (first) one.
Note:
Pattern /(?:[^-]|--)+/g
works in Javascript as well, but JSLint requires to escape -
inside of square brackets, otherwise it comes with error.
@Ωmega has the right idea in using match
instead of split
, but his regex is more complicated than it needs to be. Try this one:
s.match(/[^-]+(?:--[^-]+)*/g);
It reads exactly the way you expect it to work: Consume one or more non-hyphens, and if you encounter a double hyphen, consume that and go on consuming non-hyphens. Repeat as necessary.
EDIT: Apparently the source string may contain runs of two or more consecutive hyphens, which should not be treated as delimiters. That can be handled by adding a +
to the second hyphen:
s.match(/[^-]+(?:--+[^-]+)*/g);
You can also use a {min,max}
quantifier:
s.match(/[^-]+(?:-{2,}[^-]+)*/g);
You would need a negative lookbehind assertion as well as your negative lookahead:
(?<!-)-(?!-)
http://regexr.com?31qrn
Unfortunately the javascript regular expression parser does not support negative lookbehinds, I believe the only workaround is to inspect your results afterwards and remove any matches that would have failed the lookbehind assertion (or in this case, combine them back into a single match).
You can achieve this without negative lookbehind (as @jbabey mentioned these are not supported in JS) like that (inspired by this article):
\b-\b
Given that the regular expressions weren't very good with edge cases (like 5 consecutive delimiters) and I had to deal with replacing the double delimiters with a single one (and then again it would get tricky because '----'.replace('--', '-')
gives '---'
rather than '--'
)
I wrote a function that loops over the characters and does everything in one go (although I'm concerned that using the string accumulator can be slow :-s)
f = function(id, delim) {
var result = [];
var acc = '';
var i = 0;
while(i < id.length) {
if (id[i] == delim) {
if (id[i+1] == delim) {
acc += delim;
i++;
} else {
result.push(acc);
acc = '';
}
} else {
acc += id[i];
}
i++;
}
if (acc != '') {
result.push(acc);
}
return result;
}
and some tests:
> f('a-b--', '-')
["a", "b-"]
> f('a-b---', '-')
["a", "b-"]
> f('a-b---c', '-')
["a", "b-", "c"]
> f('a-b----c', '-')
["a", "b--c"]
> f('a-b----c-', '-')
["a", "b--c"]
> f('a-b----c-d', '-')
["a", "b--c", "d"]
> f('a-b-----c-d', '-')
["a", "b--", "c", "d"]
(If the last token is empty, it's meant to be skipped)
I don't know how to do it purely with the regex engine in JS. You could do it this way that is a little less involved than manually parsing:
var str = "two-separate-tokens-this--is--just--one--token-another";
str = str.replace(/--/g, "#!!#");
var split = str.split(/-/);
for (var i = 0; i < split.length; i++) {
split[i] = split[i].replace(/#!!#/g, "--");
}
Working demo: http://jsfiddle.net/jfriend00/hAhAB/