问题
So I need to get hours, minutes and seconds out of entries like these:
- 04:43:12
- 9.43.12
- 1:00
- 01.04
- 59
- 09
The first two is hours, minutes and seconds. Next to is minutes and seconds. Last two is just seconds.
And I came up with this regexp, that works..:
\A(?<hours>\d{1,2})(?::|\.)(?<minutes>\d{1,2})(?::|\.)(?<seconds>\d{1,2})\z|\A(?<minutes>\d{1,2})(?::|\.)(?<seconds>\d{1,2})\z|\A(?<seconds>\d{1,2})\z
But it is ugly, and I want to refactor it down to not be 3 different expressions (mostly just to learn). I tried this:
\A(?:(?<hours>\d{1,2})(?::|\.){0,1})(?:(?<minutes>\d{1,2})(?::|\.){0,1})(?:(?<seconds>\d{1,2}){0,1})\z
But that does not work - minutes and seconds sometimes get screwed up. My brain is hurting, and I can't figure out, what I am doing wrong.
回答1:
My suggestion:
(?:(?:(?<hh>\d{1,2})[:.])?(?<mm>\d{1,2})[:.])?(?<ss>\d{1,2})
structured:
(?: # group 1 (non-capturing)
(?: # group 2 (non-capturing)
(?<hh>\d{1,2}) # hours
[:.] # delimiter
)? # end group 2, make optional
(?<mm>\d{1,2}) # minutes
[:.] # delimiter
)? # end group 1, make optional
(?<ss>\d{1,2}) # seconds (required)
If you wish, you can wrap the regex in delimiters - like word boundaries \b
or string anchors (^
and $
).
EDIT: Thinking about it, you can restrict that further to capture times that make sense only. Use
[0-5]?\d
in place of
\d{1,2}
to capture values between 0 and 59 only, where appropriate (seconds and minutes).
回答2:
I haven't tested this yet, but it should work:
^(?:(?:(?<hours>\d\d?)[:\.])?(?<minutes>\d\d?)[:\.])?(?<seconds>\d\d?)$
Edit:
Now I have tested it and verified that it works. :)
回答3:
I suggest the following expression.
^(((?<Hour>[0-9]{1,2})[.:])?(?<Minute>[0-9]{1,2})[.:])?(?<Second>[0-9]{2})$
This will allow single digit hours combined with single digit minutes like 3:7:21
. If this is not desired, a slight modification is required.
^(((?<Hour>[0-9]{1,2})[.:](?=[0-9]{2}))?(?<Minute>[0-9]{1,2})[.:])?(?<Second>[0-9]{2})$
The positive lookahead assertion (?=[0-9]{2})
in the second expression solves this issue.
回答4:
there is no real good way for this, as it really depends on your particular situation what to do when not all three parts are specified. For example, in many cases, I'd maybe prefer to interpret 3:30 as 3 hours and 30 minutes instead of 3 minutes and 30 seconds. It can't hurt being explicit about that, and making it easy to derive from the regex what these kinds of inputs mean.
Therefore I personally believe that the first regex is not that ugly at all - it might be less "magic", but it is much more readable and maintainable. Make sure you and others can still read and change the code later!
If your language supports it, I would use extended regexes (with support for whitespace and comments) and split it over three lines (or 6 or 9 if you put a comment on a separate line). That won't change the regex, but it will make it feel less ugly for sure.
来源:https://stackoverflow.com/questions/1400297/matching-hours-minutes-seconds-in-regular-expressions-a-better-way