问题
How does the order of optional patterns in a DateTimeFormatter
affect the parsing operation?
I was running this program and wondered why the last line throws an exception but not the first three.
public static void main(String[] args) {
String p1 = "[EEEE][E] dd-MM-yyyy";
String p2 = "[E][EEEE] dd-MM-yyyy";
String date1 = "Thu 07-01-2016";
String date2 = "Thursday 07-01-2016";
parse(date1, p1); //OK
parse(date1, p2); //OK
parse(date2, p1); //OK
parse(date2, p2); //Exception
}
private static void parse(String date, String pattern) {
DateTimeFormatter fmt = DateTimeFormatter.ofPattern(pattern, Locale.ENGLISH);
System.out.println(fmt.parse(date));
}
The exception on the last line is:
java.time.format.DateTimeParseException: Text 'Thursday 07-01-2016' could not be parsed at index 3
回答1:
The documentation does not mention any precedence and I'll argue that the result you are getting is normal. It is the result of reading the String format from left to right.
Let's consider the first format
"[EEEE][E] dd-MM-yyyy"
."Thu 07-01-2016"
: the API tries to find if the first optional section"[EEEE]"
can be matched. Quoting from DateTimeFormatter Javadoc about text tokenExactly 4 pattern letters will use the full form.
which in this case is the full form of the day of the week. This doesn't match for
"Thu"
so that optional section will be skipped. The second optional section, however, is"[E]"
, and still quotingLess than 4 pattern letters will use the short form.
so this will match
"Thu"
. Thus the String to parse can be understood correctly"Thursday 07-01-2016"
: it the same as above, except it will match on the first optional section with"Thursday"
. But the API will still continue to search a valid section for the next optional,"[E]"
and it won't find any so the optional section is skipped.
- Let's consider the second format now
"[E][EEEE] dd-MM-yyyy"
."Thu 07-01-2016"
: the API tries to find if the first optional section"[E]"
can be matched and it does work for"Thu"
. As above, the API will now try to find a match for"[EEEE]"
but it won't find any so the optional section is skipped."Thursday 07-01-2016"
: the API tries to match"[E]"
again and that's where the thing happens: it does match."Thursday"
starts with"Thu"
so the formatter was able to find a match. But then, it tries to parse the rest which is"rsday 07-01-2016"
.[EEEE]
optional section won't be matched so it will be skipped. Then it fails with the space because there is no space on what's left (there's a"r"
instead).
So if you run your code with
parse("ThuThursday 07-01-2016", "[E][EEEE] dd-MM-yyyy");
you'll see that it works: "[E]"
matched "Thu"
and "[EEEE]"
matched "Thursday"
.
Notice how the exception message also hints at this (emphasis mine):
java.time.format.DateTimeParseException: Text 'Thursday 07-01-2016' could not be parsed at index 3
Index 3 corresponds to the "r"
of "rsday"
so it means it was able to parse, right up to this point.
回答2:
How does the order of optional patterns in a DateTimeFormatter affect the parsing operation?
The parser attempts to match each optional section in the order it appears in the pattern.
Note that the string "Thursday" starts with "Thu", which can be matched by the pattern fragment "E". Next, observe that the matching failure is reported at index 3, which corresponds to the 'r' in "Thursday". What happens in the error case is that the parser matches the first three characters of the string to the first optional section, skips the second optional section because it does not match the next part of the string, and then cannot match the 'r'.
In other words, these formatters do not backtrack to try alternative matches. In regex terms, the optional sections are greedy.
Note also that both of your patterns are more permissive than perhaps you want. For example, your pattern p1
will match the string "ThursdayThu 07-01-2016"
.
回答3:
Order of optional formats matters:
When the parser for format [E][EEEE] dd-MM-yyyy
parses "Thursday 07-01-2016"
then it
- consumes
Thu
using the optional section[E]
- skips
[EEEE]
since it cannot recognize a long day-of-week - now expects a space and fails since it sees
r
and therefore throws a exception with error index 3.
So if you use optional sections to allow parsing of alternative versions (here using either long or short day-of-weeks) add the more specific format first.
来源:https://stackoverflow.com/questions/34657355/importance-of-order-when-using-multiple-optional-patterns