问题
I have a regular expression that I want to match a latitude/longitude pair in a variety of fashions, e.g.
123 34 42
-123* 34' 42"
123* 34' 42"
+123* 34' 42"
45* 12' 22"N
45 12' 22"S
90:00:00.0N
I want to be able to match these in a pair such that
90:00:00.0N 180:00:00.0E
is a latitude/longitude pair.
or
45* 12' 22"N 46* 12' 22"E
is a latitude/longitude pair (1 degree by 1 degree cell).
or
123* 34' 42" 124* 34' 42"
is a latitude/longitude pair
etc
Using the below regular expression, when I type in 123, it matches. I suppose this is true since 123 00 00 is a valid coordinate. However, I want to use this regular expression to match pairs in the same format above
"([-|\\+]?\\d{1,3}[d|D|\u00B0|\\s](\\s*\\d{1,2}['|\u2019|\\s])?"
+ "(\\s*\\d{1,2}[\"|\u201d|\\s])?\\s*([N|n|S|s|E|e|W|w])?\\s?)"
I am using Java.
* denotes a degree.
What am I doing wrong in my regular expression?
回答1:
Well, for one thing, you're filling your character sets with a bunch of unnecessary pipe characters - alternation is implied in a []
pair. Additional cleanup: +
doesn't need to be escaped in a character class. Your regular expression seems to be addressing a bigger problem statement than you gave us - you make no mention of d
or D
as matchable character. And you've made pretty much the entire back half of your RegEx optional. Going off of what I think your original problem statement is, I built the following regular expression:
^\s*([+-]?\d{1,3}\*?\s+\d{1,2}'?\s+\d{1,2}"?[NSEW]?|\d{1,3}(:\d{2}){2}\.\d[NSEW]\s*){1,2}$
It's a bit of a doozy, but I'll break it down for you, or anyone who happens across this in the future (Hello, future!).
^
Start of string, simple.
\s*
Any amount of whitespace - even none.
(
Denotes the beginning of a group - we'll get back to that.
[-+]?
An optional sign
\d{1,3}
1 to three digits
\*?
An optional Asterisk - the escape here is key for an asterisk, but if you want to replace this with the unicode codepoint for an actual degree, you won't need it.
\s+
At least one character of whitespace
\d{1,2}
1 or two digits.
'?
Optional apostrophe
\s+\d{1,2}+
You've seen these before, but there's a new curveball - there's a plus after the {1,2}
quantifier! This makes it a possessive quantifier, meaning that the matcher won't give up its matches for this group to make another one possible. This is almost exclusively here to prevent 1 1 11 1 1
from matching, but can be used to increase speed anywhere you're 100% sure you don't need to be able to backtrack.
"?
Optional double quote. You'll have to escape this in Java.
[NSEW]?
An optional cardinal direction, designated by letter
|
OR - you can match everything in the group before this, or everything in the group after this.
\d{1,3}
Old news.
(:\d{2})
A colon, followed by two characters...
{2}
twice!
\.\d
Decimal point, followed by a single digit.
[NSEW]
Same as before, but this time it's mandatory.
\s*)
Some space, and finally the end of the group. Now, the first group has matched an entire longitude/latitude denotation, with an arbitrary amount of space at the end. Followed closely by:
{1,2}
Do that one, or two times - to match a single or a pair, then finally:
$
The end of the string.
This isn't perfect, but it's pretty close, and I think it answers the original problem statement. Plus, I feel my explanation has demystified it enough that you can edit it to further suit your needs. The one thing it doesn't (and won't) do, is enforce that the first coordinate matches the second in style. That's just too much to ask of Regular Expressions.
Doubters: Here it is in action. Please, enjoy.
回答2:
Generally, I dont think that this is a good approach.
In your interface try to have DMS coordinates in one specific format.
The User should enter this in 3 separate text fields.
Further this regex is not very maintainable.
There are much more possibilities to notate a DMS coordinate, you even cannot imagine. Humans are creative.
Eg:
Put N,S in front
or: North, 157 deg 50 min 55.796 sec
or: from wiki: The NGS now says in 1993 that point was 21-18-02.54891 N 157-50-45.90280 W
回答3:
I'm not a RE wizard but with your formats you'd need to have some kind of convention for which pair comes first (probably latitude) if you're doing parsing from a single text box.
From there, you have six numeric fields (deg, min, sec for each, possibly with a decimal point), two signs (+ or - for each) and up to two hemispheres (one for each).
As far as I can see, parsing these 8-10 fields from your input would occur in the same order each time if you demanded only that the latitude is first, and the longitude second. The rest of the symbols (save the decimal point(s)) can be treated essentially as separators.
Does that make it easier?
来源:https://stackoverflow.com/questions/13611668/degrees-minutes-seconds-dms-regex