Regex to compare string and see where is the differece

好久不见. 提交于 2019-12-02 22:26:13

问题


I am creating a regex to see if the copyright info at the top of all documents is formated correctly.

The copy right is long therefore my regex is long too.

Lets say that the copy right info looks like:

/*/////////////////////////////////////////////////////////////////////////

Copyright content which is a lot goes in here.

Programmer:  Tono Nam

/////////////////////////////////////////////////////////////////////////*/

Then I will use the regex:

var pattern = 

@"/\*/////////////////////////////////////////////////////////////////////////

Copyright content which is a lot goes in here.

Programmer:  (?<ProgammerName>[\w '\.]+)

/////////////////////////////////////////////////////////////////////////\*/";

If I apply the regex to the first text it will give me a match everything is great. the problem is when the regex does not matches Let's say that a programmer placed an extra / at the top. My regex will not match anymore. With this example it is simple to notice but the real copyright is much longer and it will be nice to know where is the error. Or sometimes there are mispelled errors. For example you might encounter Programer instead of Programmer. Just because of that I will have to look into the whole copyright and try to discover the error. I think there should be a simpler way of doing what I need


Edit

If the subject happens to be:

/*/////////////////////////////////////////////////////////////////////////

Copyright content which is a lot goes in here SOME_MISPELED_WORD.

Programmer: Tono Nam

/////////////////////////////////////////////////////////////////////////*/

then the regex will not match because of SOME_MISPELED_WORD therefore I will like to know the index where the error occurred so that I can look at:

/*/////////////////////////////////////////////////////////////////////////

Copyright content which is a lot goes in here <-------------- here

instead of the whole thing.


Another example would be if the copyright info is:

/*/////////////////////////////////////////////////////////////////////////

Copyright content which is a lot goes in here.

Programmer: Tono Nam

//////////////////////////////////////////////////////////////////////////*/

I will like to get an error at the last line because there is an extra / .


回答1:


I think having the regex as you have it above is far too strict. Try something more like the following:

@"^/\*(/*)(.*)(Programmer:|Programer:){1}(\d*)(<ProgrammerName>){1}(/*)\*/$"

That will make sure your are in a comment block, it can have any number of forward slashes at the start and end, and will not restrict the ability to enter the copyright statement while still making sure the programmer has signed his name properly. Though honestly I think trying to enforce the programmer name in a regex will cause you more hassles than it is worth in the long run. I would recommend pulling that out and just checking to see if the programmer "section" is there.




回答2:


Finally I have the solution:

Basically we want to know where the regex fails. If we where to have to strings that do not change we will be able to compare them and see the character where it is different. In other words if I where to have:

var a = "12345";
var b = "1234A";

then we could compare a[0] with b[0] then a[1] with b[1] until we have a difference.

so let's do that!

let's say our copy right must look like:

/*/////

Copyright content which is a lot goes in here.

Programmer:Tono Nam

Description:This is the description of the file....

/////*/

let's remove all the things that can vary so we can apply our first example:

/*/////

Copyright content which is a lot goes in here.

Programmer:

Description:

/////*/

Then the only thing complicated will be to create a regex that will remove all the things that could vary in order to end up with that string. so that pattern will be:

 var regexPattern = @"(?s)(/\*/*.+Programmer:)(?<name>[^\r\n]*?)(\r.*Description:)(?<desc>[^\r\n]*)(\r.*?/*\*/)";

with that pattern we will be able to turn:

/*/////

Copyright content which is a lot goes in here.

Programmer:Tono Nam bla bla bla

Description:THIS IS A DIFFERENT DESCRIPTION

/////*/

INTO

/*/////

Copyright content which is a lot goes in here.

Programmer:

Description:

/////*/

Now we have two string to compare!




Here is the code of what I just explained

// the subject we want to test
            var subject =
@"/*/////

Copyright content which is a lot goes in here.

Programmer:Tono Nam

Description:This is the description of the file....

/////*/";

            // the actual pattern this should be a readonly constant type on a real program cause it never should change
            var pattern =
@"/*/////

Copyright content which is a lot goes in here.

Programmer:

Description:

/////*/";

            // we use this pattern to turn the first subject into the second if we can
            var regexPattern = @"(?s)(/\*/*.+Programmer:)(?<name>[^\r\n]*?)(\r.*Description:)(?<desc>[^\r\n]*)(\r.*?/*\*/)";

            // note $1 means group 1 so here we are basically removing the groups name and desc
            var newSubject = Regex.Replace(subject, regexPattern, "$1$2$3");

            // at this point if newSubject = pattern we know that the header is formatted correctly!

            // Let's see where they are different!
            for (int i = 0; i < pattern.Length; i++)
            {
                if (pattern[i] != newSubject[i])
                {
                    throw new Exception("There is a problem at index " + i);
                }
            }

on this example it should work because my subject is formated correctly. but if I place an extra / at the begging then look what happens: (I highlighted the 6 / chars there should have been 5




回答3:


Try this Regex:

/\*/{2,}(?:\n|.)*(?:Programm?er\s*:\s*(?<programmer>.+))[\n\r\s]*(?:Description\s*:\s*(?<description>.+))?

and get groups named programmer and description. this works for all above conditions.



来源:https://stackoverflow.com/questions/11729186/regex-to-compare-string-and-see-where-is-the-differece

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!