I\'m finding this fairly hard to explain, so I\'ll kick off with a few examples of before/after of what I\'d like to achieve.
Example of input:
<Since every word starts with a capital (uppercase) letter, I would suggest that you first remove all dots, and replace it with no space (""). Then, iterate over all characters and put space between lowercase letter and following uppercase letter. Also, if you encounter an uppercase with following lowercase, put the space before the uppercase.
It will work for all examples you provided, but I am not sure if there are any exceptions to my observation.
How about removing dots that need to disappear with regex, and then replace rest of dots with space? Regex can look like (?<=(^|[.])[\\S&&\\D])[.](?=[\\S&&\\D]([.]|$))
.
String[] data = {
"Hello.World",
"This.Is.A.Test",
"The.S.W.A.T.Team",
"S.w.a.T.",
"S.w.a.T.1",
"2001.A.Space.Odyssey" };
for (String s : data) {
System.out.println(s.replaceAll(
"(?<=(^|[.])[\\S&&\\D])[.](?=[\\S&&\\D]([.]|$))", "")
.replace('.', ' '));
}
result
Hello World
This Is A Test
The SWAT Team
SwaT
SwaT 1
2001 A Space Odyssey
In regex I needed to escape special meaning of dot characters. I could do it with \\.
but I prefer [.]
.
So at canter of regex we have dot literal. Now this dot is surrounded with (?<=...)
and (?=...)
. These are parts of look-around mechanism called look-behind and look-ahead.
Since dots that need to be removed have dot (or start of data ^
) and some non-white-space \\S
that is also non-digit \D character before it I can test it using (?<=(^|[.])[\\S&&\\D])[.]
.
Also dot that needs to be removed have also non-white-space and non-digit character and another dot (optionally end of data $
) after it, which can be written as [.](?=[\\S&&\\D]([.]|$))
Depending on needs [\\S&&\\D]
which beside letters also matches characters like !@#$%^&*()-_=+...
can be replaced with [a-zA-Z]
for only English letters, or \\p{IsAlphabetic}
for all letters in Unicode.