Split string with a single occurence (not twice) of a delimiter in Javascript

前端 未结 6 1424
星月不相逢
星月不相逢 2020-12-18 05:04

This is better explained with an example. I want to achieve an split like this:

two-separate-tokens-this--is--just--one--token-another

->

相关标签:
6条回答
  • 2020-12-18 05:19

    str.match(/(?!-)(.*?[^\-])(?=(?:-(?!-)|$))/g);

    Check this fiddle.


    Explanation:

    Non-greedy pattern (?!-)(.*?[^\-]) match a string that does not start and does not end with dash character and pattern (?=(?:-(?!-)|$)) requires such match to be followed by single dash character or by end of line. Modifier /g forces function match to find all occurrences, not just a single (first) one.


    Edit (based on OP's comment):

    str.match(/(?:[^\-]|--)+/g);

    Check this fiddle.

    Explanation:

    Pattern (?:[^\-]|--) will match non-dash character or double-dash string. Sign + says that such matching from the previous pattern should be multiplied as many times as can. Modifier /g forces function match to find all occurrences, not just a single (first) one.

    Note:

    Pattern /(?:[^-]|--)+/g works in Javascript as well, but JSLint requires to escape - inside of square brackets, otherwise it comes with error.

    0 讨论(0)
  • 2020-12-18 05:23

    @Ωmega has the right idea in using match instead of split, but his regex is more complicated than it needs to be. Try this one:

    s.match(/[^-]+(?:--[^-]+)*/g);
    

    It reads exactly the way you expect it to work: Consume one or more non-hyphens, and if you encounter a double hyphen, consume that and go on consuming non-hyphens. Repeat as necessary.


    EDIT: Apparently the source string may contain runs of two or more consecutive hyphens, which should not be treated as delimiters. That can be handled by adding a + to the second hyphen:

    s.match(/[^-]+(?:--+[^-]+)*/g);
    

    You can also use a {min,max} quantifier:

    s.match(/[^-]+(?:-{2,}[^-]+)*/g);
    
    0 讨论(0)
  • 2020-12-18 05:27

    You would need a negative lookbehind assertion as well as your negative lookahead:

    (?<!-)-(?!-)
    

    http://regexr.com?31qrn

    Unfortunately the javascript regular expression parser does not support negative lookbehinds, I believe the only workaround is to inspect your results afterwards and remove any matches that would have failed the lookbehind assertion (or in this case, combine them back into a single match).

    0 讨论(0)
  • 2020-12-18 05:28

    You can achieve this without negative lookbehind (as @jbabey mentioned these are not supported in JS) like that (inspired by this article):

    \b-\b
    
    0 讨论(0)
  • 2020-12-18 05:34

    Given that the regular expressions weren't very good with edge cases (like 5 consecutive delimiters) and I had to deal with replacing the double delimiters with a single one (and then again it would get tricky because '----'.replace('--', '-') gives '---' rather than '--') I wrote a function that loops over the characters and does everything in one go (although I'm concerned that using the string accumulator can be slow :-s)

    f = function(id, delim) {
        var result = [];
        var acc = '';
        var i = 0;
        while(i < id.length) {
            if (id[i] == delim) {
                if (id[i+1] == delim) {
                    acc += delim;
                    i++;
                } else {
                    result.push(acc);
                    acc = '';
                }
            } else {
                acc += id[i];
            }
            i++;
        }
    
        if (acc != '') {
            result.push(acc);
        }
    
        return result;
        }
    

    and some tests:

    > f('a-b--', '-')
    ["a", "b-"]
    > f('a-b---', '-')
    ["a", "b-"]
    > f('a-b---c', '-')
    ["a", "b-", "c"]
    > f('a-b----c', '-')
    ["a", "b--c"]
    > f('a-b----c-', '-')
    ["a", "b--c"]
    > f('a-b----c-d', '-')
    ["a", "b--c", "d"]
    > f('a-b-----c-d', '-')
    ["a", "b--", "c", "d"]
    

    (If the last token is empty, it's meant to be skipped)

    0 讨论(0)
  • 2020-12-18 05:41

    I don't know how to do it purely with the regex engine in JS. You could do it this way that is a little less involved than manually parsing:

    var str = "two-separate-tokens-this--is--just--one--token-another";
    str = str.replace(/--/g, "#!!#");
    var split = str.split(/-/);
    for (var i = 0; i < split.length; i++) {
        split[i] = split[i].replace(/#!!#/g, "--");
    }
    

    Working demo: http://jsfiddle.net/jfriend00/hAhAB/

    0 讨论(0)
提交回复
热议问题