Say I have a regex matching a hexadecimal 32 bit number:
([0-9a-fA-F]{1,8})
When I construct a regex where I need to match this multiple ti
When you want to use a sub-expression multiple times without rewriting it, you can group it then call it as a subroutine. Subroutines may be called by name, index, or relative position.
Subroutines are supported by PCRE, Perl, Ruby, PHP, Delphi, R, and others. Unfortunately, the .NET Framework is lacking, but there are some PCRE libraries for .NET that you can use instead (such as https://github.com/ltrzesniewski/pcre-net).
Here's how subroutines work: let's say you have a sub-expression [abc]
that you want to repeat three times in a row.
Standard RegEx
Any: [abc][abc][abc]
Subroutine, by Name
Perl: (?'name'[abc])(?&name)(?&name)
PCRE: (?P<name>[abc])(?P>name)(?P>name)
Ruby: (?<name>[abc])\g<name>\g<name>
Subroutine, by Index
Perl/PCRE: ([abc])(?1)(?1)
Ruby: ([abc])\g<1>\g<1>
Subroutine, by Relative Position
Perl: ([abc])(?-1)(?-1)
PCRE: ([abc])(?-1)(?-1)
Ruby: ([abc])\g<-1>\g<-1>
Subroutine, Predefined
This defines a subroutine without executing it.
Perl/PCRE: (?(DEFINE)(?'name'[abc]))(?P>name)(?P>name)(?P>name)
Matches a valid IPv4 address string, from 0.0.0.0 to 255.255.255.255:
((?:25[0-5])|(?:2[0-4][0-9])|(?:[0-1]?[0-9]?[0-9]))\.(?1)\.(?1)\.(?1)
Without subroutines:
((?:25[0-5])|(?:2[0-4][0-9])|(?:[0-1]?[0-9]?[0-9]))\.((?:25[0-5])|(?:2[0-4][0-9])|(?:[0-1]?[0-9]?[0-9]))\.((?:25[0-5])|(?:2[0-4][0-9])|(?:[0-1]?[0-9]?[0-9]))\.((?:25[0-5])|(?:2[0-4][0-9])|(?:[0-1]?[0-9]?[0-9]))
And to solve the original posted problem:
(?<from>(?P<hexnum>[0-9a-fA-F]{1,8}))\s*:\s*(?<to>(?P>hexnum))
http://regular-expressions.info/subroutine.html
http://regex101.com/
There is no such predefined class. I think you can simplify it using ignore-case option, e.g.:
(?i)(?<from>[0-9a-z]{1,8})\s*:\s*(?<to>[0-9a-z]{1,8})
Why not do something like this, not really shorter but a bit more maintainable.
String.Format("(?<from>{0})\s*:\s*(?<to>{0})", "[0-9a-zA-Z]{1,8}");
If you want more self documenting code i would assign the number regex string to a properly named const variable.
.NET regex does not support pattern recursion, and if you can use (?<from>(?<hex>[0-9a-fA-F]{1,8}))\s*:\s*(?<to>(\g<hex>)) in Ruby and PHP/PCRE (where hex
is a "technical" named capturing group whose name should not occur in the main pattern), in .NET, you may just define the block(s) as separate variables, and then use them to build a dynamic pattern.
Starting with C#6, you may use an interpolated string literal that looks very much like a PCRE/Onigmo subpattern recursion, but is actually cleaner and has no potential bottleneck when the group is named identically to the "technical" capturing group:
C# demo:
using System;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
var block = "[0-9a-fA-F]{1,8}";
var pattern = $@"(?<from>{block})\s*:\s*(?<to>{block})";
Console.WriteLine(Regex.IsMatch("12345678 :87654321", pattern));
}
}
The $@"..."
is a verbatim interpolated string literal, where escape sequences are treated as combinations of a literal backslash and a char after it. Make sure to define literal {
with {{
and }
with }}
(e.g. $@"(?:{block}){{5}}"
to repeat a block
5 times).
For older C# versions, use string.Format
:
var pattern = string.Format(@"(?<from>{0})\s*:\s*(?<to>{0})", block);
as is suggested in Mattias's answer.
If I am understanding your question correctly, you want to reuse certain patterns to construct a bigger pattern?
string f = @"fc\d+/";
string e = @"\d+";
Regex regexObj = new Regex(f+e);
Other than this, using backreferences will only help if you are trying to match the exact same string that you have previously matched somewhere in your regex.
e.g.
/\b([a-z])\w+\1\b/
Will only match : text
, spaces
in the above text :
This is a sample text which is not the title since it does not end with 2 spaces.
To reuse regex named capture group use this syntax: \k<name> or \k'name'
So the answer is:
(?<from>[0-9a-fA-F]{1,8})\s*:\s*\k<from>
More info: http://www.regular-expressions.info/named.html