Javascript and RegEx: Split and keep delimiter

↘锁芯ラ 提交于 2019-12-02 00:31:02

Try to use match instead:

var test = paragraph.match(/&#?[a-zA-Z0-9]+;\s[^&]*/g);

Updated: Added a required white-space \s match.

Explanation:

  • &#? Match & and an optional # (the question mark match previous one or zero times)

  • [a-zA-Z0-9] is a range of all upper and lower case characters and digits. If you also accept an underscore you could replace this with \w.

  • The + sign means that it should match the last pattern one or more times, so it matches one or more characters a-z, A-Z and digits 0-9.

  • The ; matches the character ;.

  • The \s matches the class white-space. That includes space, tab and other white-space characters.

  • [^&]* Once again a range, but since ^ is the first character the match is negated, so instead of matching the &-characters it matches everything but the &. The star matches the pattern zero or more times.

  • g at the end, after the last / means global, and makes the match continue after the first match and get an array of all matches.

So, match & and an optional #, followed by any number of letters or digits (but at least one), followed by ;, followed by a white-space, followed by zero or more characters that isn't &.

As I said in the comment, this solution (untested, by the way) will only work if you're just managing <br /> elements. Here:

var text = paragraph.split("<br />"); // now text contains just the text on each line

for(var i = 0; i<text.length-1; i++) { // don't want to add an line break to our last line
    text[i] += " <br />"; // replace the <br /> elements on each line
}

The variable text is now an array, where each element of the array is a line of the original paragraph. The linebreaks (<br />) have been added back on the end of each line. You just mentioned that you want to split on the special characters, but from what I see, each line ends in a line break, so this should hopefully have the same effect. Unfortunately I don't have the time to write up a more complete answer at the moment.

Using regex it is pretty simple:

var result = input.match(/&#?[^\W_]+;\s[^&]*/g);

Test it here.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!