Matching hours/minutes/seconds in regular expressions - a better way?

雨燕双飞 提交于 2019-12-21 00:40:08

问题


So I need to get hours, minutes and seconds out of entries like these:

  • 04:43:12
  • 9.43.12
  • 1:00
  • 01.04
  • 59
  • 09

The first two is hours, minutes and seconds. Next to is minutes and seconds. Last two is just seconds.

And I came up with this regexp, that works..:

\A(?<hours>\d{1,2})(?::|\.)(?<minutes>\d{1,2})(?::|\.)(?<seconds>\d{1,2})\z|\A(?<minutes>\d{1,2})(?::|\.)(?<seconds>\d{1,2})\z|\A(?<seconds>\d{1,2})\z

But it is ugly, and I want to refactor it down to not be 3 different expressions (mostly just to learn). I tried this:

\A(?:(?<hours>\d{1,2})(?::|\.){0,1})(?:(?<minutes>\d{1,2})(?::|\.){0,1})(?:(?<seconds>\d{1,2}){0,1})\z

But that does not work - minutes and seconds sometimes get screwed up. My brain is hurting, and I can't figure out, what I am doing wrong.


回答1:


My suggestion:

(?:(?:(?<hh>\d{1,2})[:.])?(?<mm>\d{1,2})[:.])?(?<ss>\d{1,2})

structured:

(?:                     # group 1 (non-capturing)
  (?:                   #   group 2 (non-capturing)
    (?<hh>\d{1,2})      #     hours
    [:.]                #     delimiter
  )?                    #   end group 2, make optional
  (?<mm>\d{1,2})        #   minutes
  [:.]                  #   delimiter
)?                      # end group 1, make optional
(?<ss>\d{1,2})          # seconds (required)

If you wish, you can wrap the regex in delimiters - like word boundaries \b or string anchors (^ and $).

EDIT: Thinking about it, you can restrict that further to capture times that make sense only. Use

[0-5]?\d

in place of

\d{1,2}

to capture values between 0 and 59 only, where appropriate (seconds and minutes).




回答2:


I haven't tested this yet, but it should work:

^(?:(?:(?<hours>\d\d?)[:\.])?(?<minutes>\d\d?)[:\.])?(?<seconds>\d\d?)$

Edit:
Now I have tested it and verified that it works. :)




回答3:


I suggest the following expression.

^(((?<Hour>[0-9]{1,2})[.:])?(?<Minute>[0-9]{1,2})[.:])?(?<Second>[0-9]{2})$

This will allow single digit hours combined with single digit minutes like 3:7:21. If this is not desired, a slight modification is required.

^(((?<Hour>[0-9]{1,2})[.:](?=[0-9]{2}))?(?<Minute>[0-9]{1,2})[.:])?(?<Second>[0-9]{2})$

The positive lookahead assertion (?=[0-9]{2}) in the second expression solves this issue.




回答4:


there is no real good way for this, as it really depends on your particular situation what to do when not all three parts are specified. For example, in many cases, I'd maybe prefer to interpret 3:30 as 3 hours and 30 minutes instead of 3 minutes and 30 seconds. It can't hurt being explicit about that, and making it easy to derive from the regex what these kinds of inputs mean.

Therefore I personally believe that the first regex is not that ugly at all - it might be less "magic", but it is much more readable and maintainable. Make sure you and others can still read and change the code later!

If your language supports it, I would use extended regexes (with support for whitespace and comments) and split it over three lines (or 6 or 9 if you put a comment on a separate line). That won't change the regex, but it will make it feel less ugly for sure.



来源:https://stackoverflow.com/questions/1400297/matching-hours-minutes-seconds-in-regular-expressions-a-better-way

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!