I am creating a regex to see if the copyright info at the top of all documents is formated correctly.
The copy right is long therefore my regex is long too.
Lets say that the copy right info looks like:
/*/////////////////////////////////////////////////////////////////////////
Copyright content which is a lot goes in here.
Programmer: Tono Nam
/////////////////////////////////////////////////////////////////////////*/
Then I will use the regex:
var pattern =
@"/\*/////////////////////////////////////////////////////////////////////////
Copyright content which is a lot goes in here.
Programmer: (?<ProgammerName>[\w '\.]+)
/////////////////////////////////////////////////////////////////////////\*/";
If I apply the regex to the first text it will give me a match everything is great. the problem is when the regex does not matches Let's say that a programmer placed an extra /
at the top. My regex will not match anymore. With this example it is simple to notice but the real copyright is much longer and it will be nice to know where is the error. Or sometimes there are mispelled errors. For example you might encounter Programer instead of Programmer. Just because of that I will have to look into the whole copyright and try to discover the error. I think there should be a simpler way of doing what I need
Edit
If the subject happens to be:
/*/////////////////////////////////////////////////////////////////////////
Copyright content which is a lot goes in here SOME_MISPELED_WORD.
Programmer: Tono Nam
/////////////////////////////////////////////////////////////////////////*/
then the regex will not match because of SOME_MISPELED_WORD
therefore I will like to know the index where the error occurred so that I can look at:
/*/////////////////////////////////////////////////////////////////////////
Copyright content which is a lot goes in here <-------------- here
instead of the whole thing.
Another example would be if the copyright info is:
/*/////////////////////////////////////////////////////////////////////////
Copyright content which is a lot goes in here.
Programmer: Tono Nam
//////////////////////////////////////////////////////////////////////////*/
I will like to get an error at the last line because there is an extra /
.
I think having the regex as you have it above is far too strict. Try something more like the following:
@"^/\*(/*)(.*)(Programmer:|Programer:){1}(\d*)(<ProgrammerName>){1}(/*)\*/$"
That will make sure your are in a comment block, it can have any number of forward slashes at the start and end, and will not restrict the ability to enter the copyright statement while still making sure the programmer has signed his name properly. Though honestly I think trying to enforce the programmer name in a regex will cause you more hassles than it is worth in the long run. I would recommend pulling that out and just checking to see if the programmer "section" is there.
Finally I have the solution:
Basically we want to know where the regex fails. If we where to have to strings that do not change we will be able to compare them and see the character where it is different. In other words if I where to have:
var a = "12345";
var b = "1234A";
then we could compare a[0]
with b[0]
then a[1]
with b[1]
until we have a difference.
so let's do that!
let's say our copy right must look like:
/*/////
Copyright content which is a lot goes in here.
Programmer:Tono Nam
Description:This is the description of the file....
/////*/
let's remove all the things that can vary so we can apply our first example:
/*/////
Copyright content which is a lot goes in here.
Programmer:
Description:
/////*/
Then the only thing complicated will be to create a regex that will remove all the things that could vary in order to end up with that string. so that pattern will be:
var regexPattern = @"(?s)(/\*/*.+Programmer:)(?<name>[^\r\n]*?)(\r.*Description:)(?<desc>[^\r\n]*)(\r.*?/*\*/)";
with that pattern we will be able to turn:
/*/////
Copyright content which is a lot goes in here.
Programmer:Tono Nam bla bla bla
Description:THIS IS A DIFFERENT DESCRIPTION
/////*/
INTO
/*/////
Copyright content which is a lot goes in here.
Programmer:
Description:
/////*/
Now we have two string to compare!
Here is the code of what I just explained
// the subject we want to test
var subject =
@"/*/////
Copyright content which is a lot goes in here.
Programmer:Tono Nam
Description:This is the description of the file....
/////*/";
// the actual pattern this should be a readonly constant type on a real program cause it never should change
var pattern =
@"/*/////
Copyright content which is a lot goes in here.
Programmer:
Description:
/////*/";
// we use this pattern to turn the first subject into the second if we can
var regexPattern = @"(?s)(/\*/*.+Programmer:)(?<name>[^\r\n]*?)(\r.*Description:)(?<desc>[^\r\n]*)(\r.*?/*\*/)";
// note $1 means group 1 so here we are basically removing the groups name and desc
var newSubject = Regex.Replace(subject, regexPattern, "$1$2$3");
// at this point if newSubject = pattern we know that the header is formatted correctly!
// Let's see where they are different!
for (int i = 0; i < pattern.Length; i++)
{
if (pattern[i] != newSubject[i])
{
throw new Exception("There is a problem at index " + i);
}
}
on this example it should work because my subject is formated correctly. but if I place an extra / at the begging then look what happens: (I highlighted the 6 /
chars there should have been 5

Try this Regex
:
/\*/{2,}(?:\n|.)*(?:Programm?er\s*:\s*(?<programmer>.+))[\n\r\s]*(?:Description\s*:\s*(?<description>.+))?
and get groups named programmer
and description
. this works for all above conditions.
来源:https://stackoverflow.com/questions/11729186/regex-to-compare-string-and-see-where-is-the-differece