问题
I have a regex which will split my string into arrays.
Everyything works fine except that I would like to keep a part of the delimiter.
Here is my regex:
(&#?[a-zA-Z0-9]+;)[\s]
in Javascript, I am doing:
var test = paragraph.split(/(&#?[a-zA-Z0-9]+;)[\s]/g);
My paragraph is as followed:
Current addresses: † Biopharmaceutical Research and Development<br />
‡ Clovis Oncology<br />
§ Pisces Molecular <br />
|| School of Biological Sciences
¶ Department of Chemistry<br />
The problem is that I am getting 10 elements in my array and not 5 as I should. In fact, I am also getting my delimiter as an element and my goal is to keep the delimiter with the splited element and not to create a new one.
Thank you very much for your help.
EDIT:
I would like to get this as a result:
1. † Biopharmaceutical Research and Development<br />
2. ‡ Clovis Oncology<br />
3. § § Pisces Molecular <br />
|| School of Biological Sciences
4. ¶ Department of Chemistry<br />
回答1:
Try to use match instead:
var test = paragraph.match(/&#?[a-zA-Z0-9]+;\s[^&]*/g);
Updated: Added a required white-space \s match.
Explanation:
&#?Match&and an optional#(the question mark match previous one or zero times)[a-zA-Z0-9]is a range of all upper and lower case characters and digits. If you also accept an underscore you could replace this with\w.The
+sign means that it should match the last pattern one or more times, so it matches one or more characters a-z, A-Z and digits 0-9.The
;matches the character;.The
\smatches the class white-space. That includes space, tab and other white-space characters.[^&]*Once again a range, but since^is the first character the match is negated, so instead of matching the&-characters it matches everything but the&. The star matches the pattern zero or more times.gat the end, after the last/meansglobal, and makes thematchcontinue after the first match and get an array of all matches.
So, match & and an optional #, followed by any number of letters or digits (but at least one), followed by ;, followed by a white-space, followed by zero or more characters that isn't &.
回答2:
As I said in the comment, this solution (untested, by the way) will only work if you're just managing <br /> elements. Here:
var text = paragraph.split("<br />"); // now text contains just the text on each line
for(var i = 0; i<text.length-1; i++) { // don't want to add an line break to our last line
text[i] += " <br />"; // replace the <br /> elements on each line
}
The variable text is now an array, where each element of the array is a line of the original paragraph. The linebreaks (<br />) have been added back on the end of each line. You just mentioned that you want to split on the special characters, but from what I see, each line ends in a line break, so this should hopefully have the same effect. Unfortunately I don't have the time to write up a more complete answer at the moment.
回答3:
Using regex it is pretty simple:
var result = input.match(/&#?[^\W_]+;\s[^&]*/g);
Test it here.
来源:https://stackoverflow.com/questions/12317499/javascript-and-regex-split-and-keep-delimiter