I am struggling with building a regular expression for parsing this kind of strings (bible scriptures):
\'John 14:16–17, 25–26\'
\'John 14:16–17\'
\'Jo
Use this regex :
[A-Za-z]+( ([0-9]+)(:[0-9]+)?([\-–][0-9]+)?(, [0-9]+[\-–][0-9]+)?)?
Or in its 'prettier' version :
\w+( (\d+)(:\d+)?([\-–]\d+)?(, \d+[\-–]\d+)?)?
UPDATED : To match dashes or hyphens
NOTE : I've tested it and it matches ALL 5 possible versions.
Example : http://regexr.com?30h4q
I think this does what you need:
\w+\s?(\d{1,2})?(:\d{1,2})?([-–]\d{1,2})?(,\s\d{1,2}[-–]\d{1,2})?
Assumptions:
-
and –
Below is the regex with comments:
"
\w # Match a single character that is a “word character” (letters, digits, and underscores)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
? # Between zero and one times, as many times as possible, giving back as needed (greedy)
( # Match the regular expression below and capture its match into backreference number 1
\d # Match a single digit 0..9
{1,2} # Between one and 2 times, as many times as possible, giving back as needed (greedy)
)? # Between zero and one times, as many times as possible, giving back as needed (greedy)
( # Match the regular expression below and capture its match into backreference number 2
: # Match the character “:” literally
\d # Match a single digit 0..9
{1,2} # Between one and 2 times, as many times as possible, giving back as needed (greedy)
)? # Between zero and one times, as many times as possible, giving back as needed (greedy)
( # Match the regular expression below and capture its match into backreference number 3
[-–] # Match a single character present in the list “-–”
\d # Match a single digit 0..9
{1,2} # Between one and 2 times, as many times as possible, giving back as needed (greedy)
)? # Between zero and one times, as many times as possible, giving back as needed (greedy)
( # Match the regular expression below and capture its match into backreference number 4
, # Match the character “,” literally
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
\d # Match a single digit 0..9
{1,2} # Between one and 2 times, as many times as possible, giving back as needed (greedy)
[-–] # Match a single character present in the list “-–”
\d # Match a single digit 0..9
{1,2} # Between one and 2 times, as many times as possible, giving back as needed (greedy)
)? # Between zero and one times, as many times as possible, giving back as needed (greedy)
"
And here are some examples of its usage in php:
if (preg_match('/\w+\s?(\d{1,2})?(:\d{1,2})?([-–]\d{1,2})?(,\s\d{1,2}[-–]\d{1,2})?/', $subject)) {
# Successful match
} else {
# Match attempt failed
}
Get an array of all matches in a given string
preg_match_all('/\w+\s?(\d{1,2})?(:\d{1,2})?([-–]\d{1,2})?(,\s\d{1,2}[-–]\d{1,2})?/', $subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];
Try this here
\b[a-zA-Z]+(?:\s+\d+)?(?::\d+(?:–\d+)?(?:,\s*\d+(?:–\d+)?)*)?
See and test it here on Regexr
Because of the (?:,\s*\d+(?:–\d+)?)*
at the end you can have a list of verses, verses ranges at the end.
(\b[a-zA-Z]\w+\s\d+)(:\d+)+([-–]\d+)?([,;](\s)?(\d+:)?\d+([-–]\d+)?)?
This is a hybrid of all code presented here. The only formats it will not highlight are "book name only" or "book & chapter only" (just add ":1-all" after chapter #) I found the other codes provided to qualify too many variations, not in line with bible verse syntax.
These are the examples I tested in RegExr: (can't post images yet)
Johnhumbolt 14:16–17, 25–26
John 14:16–17
John 14:16
John 77:3; 2:9-11
John 5:1-allbrad 555-783-6867
John 6
hi there how are you
Ezra 32:5 John 14:16-17, 25-36
1223:34
John 14:16-17,25-36
John 14:16-17; 32:25
([1|2|3]?([i|I]+)?(\s?)\w+(\s+?))((\d+)?(,?)(\s?)(\d+))+(:?)((\d+)?([\-–]\d+)?(,(\s?)\d+[\-–]\d+)?)?
works for almost every book...