Capitalize first letters in words in the string with different separators using java 8 stream

爷,独闯天下 提交于 2021-02-08 09:54:41

问题


I need to capitalize first letter in every word in the string, BUT it's not so easy as it seems to be as the word is considered to be any sequence of letters, digits, "_" , "-", "`" while all other chars are considered to be separators, i.e. after them the next letter must be capitalized.

Example what program should do:

For input: "#he&llo wo!r^ld"

Output should be: "#He&Llo Wo!R^Ld"

There are questions that sound similar here, but there solutions really don't help. This one for example:

String output = Arrays.stream(input.split("[\\s&]+"))
                    .map(t -> t.substring(0, 1).toUpperCase() + t.substring(1))
                    .collect(Collectors.joining(" "));

As in my task there can be various separators, this solution doesn't work.


回答1:


It is possible to split a string and keep the delimiters, so taking into account the requirement for delimiters:

word is considered to be any sequence of letters, digits, "_" , "-", "`" while all other chars are considered to be separators

the pattern which keeps the delimiters in the result array would be: "((?<=[^-`\\w])|(?=[^-`\\w]))":

[^-`\\w]: all characters except -, backtick and word characters \w: [A-Za-z0-9_]

Then, the "words" are capitalized, and delimiters are kept as is:

static String capitalize(String input) {
    if (null == input || 0 == input.length()) {
        return input;
    }
    return Arrays.stream(input.split("((?<=[^-`\\w])|(?=[^-`\\w]))"))
                 .map(s -> s.matches("[-`\\w]+") ? Character.toUpperCase(s.charAt(0)) + s.substring(1) : s)
                 .collect(Collectors.joining(""));
}

Tests:

System.out.println(capitalize("#he&l_lo-wo!r^ld"));
System.out.println(capitalize("#`he`&l+lo wo!r^ld"));

Output:

#He&l_lo-wo!R^Ld
#`he`&L+Lo Wo!R^Ld

Update
If it is needed to process not only ASCII set of characters but apply to other alphabets or character sets (e.g. Cyrillic, Greek, etc.), POSIX class \\p{IsWord} may be used and matching of Unicode characters needs to be enabled using pattern flag (?U):

static String capitalizeUnicode(String input) {
    if (null == input || 0 == input.length()) {
        return input;
    }
    
    return Arrays.stream(input.split("(?U)((?<=[^-`\\p{IsWord}])|(?=[^-`\\p{IsWord}]))")
                 .map(s -> s.matches("(?U)[-`\\p{IsWord}]+") ? Character.toUpperCase(s.charAt(0)) + s.substring(1) : s)
                 .collect(Collectors.joining(""));
}

Test:

System.out.println(capitalizeUnicode("#he&l_lo-wo!r^ld"));
System.out.println(capitalizeUnicode("#привет&`ёж`+дос^βιδ/ως"));

Output:

#He&L_lo-wo!R^Ld
#Привет&`ёж`+Дос^Βιδ/Ως



回答2:


You can't use split that easily - split will eliminate the separators and give you only the things in between. As you need the separators, no can do.

One real dirty trick is to use something called 'lookahead'. That argument you pass to split is a regular expression. Most 'characters' in a regexp have the property that they consume the matching input. If you do input.split("\\s+") then that doesn't 'just' split on whitespace, it also consumes them: The whitespace is no longer part of the individual entries in your string array.

However, consider ^ and $. or \\b. These still match things but don't consume anything. You don't consume 'end of string'. In fact, ^^^hello$$$ matches the string "hello" just as well. You can do this yourself, using lookahead: It matches when the lookahead is there but does not consume it:

String[] args = "Hello World$Huh   Weird".split("(?=[\\s_$-]+)");
for (String arg : args) System.out.println("*" + args[i] + "*");

Unfortunately, this 'works', in that it saves your separators, but isn't getting you all that much closer to a solution:

*Hello*
* World*
*$Huh*
* *
* *
* Weird*

You can go with lookbehind as well, but it's limited; they don't do variable length, for example.

The conclusion should rapidly become: Actually, doing this with split is a mistake.

Then, once split is off the table, you should no longer use streams, either: Streams don't do well once you need to know stuff about the previous element in a stream to do the job: A stream of characters doesn't work, as you need to know if the previous character was a non-letter or not.

In general, "I want to do X, and use Y" is a mistake. Keep an open mind. It's akin to asking: "I want to butter my toast, and use a hammer to do it". Oookaaaaayyyy, you can probably do that, but, eh, why? There are butter knives right there in the drawer, just.. put down the hammer, that's toast. Not a nail.

Same here.

A simple loop can take care of this, no problem:

private static final String BREAK_CHARS = "&-_`";

public String toTitleCase(String input) {
  StringBuilder out = new StringBuilder();
  boolean atBreak = true;
  for (char c : input.toCharArray()) {
    out.append(atBreak ? Character.toUpperCase(c) : c);
    atBreak = Character.isWhitespace(c) || (BREAK_CHARS.indexOf(c) > -1);
  }
  return out.toString();
}

Simple. Efficient. Easy to read. Easy to modify. For example, if you want to go with 'any non-letter counts', trivial: atBreak = Character.isLetter(c);.

Contrast to the stream solution which is fragile, weird, far less efficient, and requires a regexp that needs half a page's worth of comment for anybody to understand it.

Can you do this with streams? Yes. You can butter toast with a hammer, too. Doesn't make it a good idea though. Put down the hammer!




回答3:


You can use a simple FSM as you iterate over the characters in the string, with two states, either in a word, or not in a word. If you are not in a word and the next character is a letter, convert it to upper case, otherwise, if it is not a letter or if you are already in a word, simply copy it unmodified.

boolean isWord(int c) {
    return c == '`' || c == '_' || c == '-' || Character.isLetter(c) || Character.isDigit(c);
}

String capitalize(String s) {
    StringBuilder sb = new StringBuilder();
    boolean inWord = false;
    for (int c : s.codePoints().toArray()) {
        if (!inWord && Character.isLetter(c)) {
            sb.appendCodePoint(Character.toUpperCase(c));
        } else {
            sb.appendCodePoint(c);
        }
        inWord = isWord(c);
    }
    return sb.toString();
}

Note: I have used codePoints(), appendCodePoint(int), and int so that characters outside the basic multilingual plane (with code points greater than 64k) are handled correctly.




回答4:


I need to capitalize first letter in every word

Here is one way to do it. Admittedly this is a might longer but your requirement to change the first letter to upper case (not first digit or first non-letter) required a helper method. Otherwise it would have been easier. Some others seemed to have missed this point.

Establish word pattern, and test data.

String wordPattern = "[\\w_-`]+";
Pattern p = Pattern.compile(wordPattern);
String[] inputData = { "#he&llo wo!r^ld", "0hel`lo-w0rld" };

Now this simply finds each successive word in the string based on the established regular expression. As each word is found, it changes the first letter in the word to upper case and then puts it in a string buffer in the correct position where the match was found.

for (String input : inputData) {
    StringBuilder sb = new StringBuilder(input);
    Matcher m = p.matcher(input);
    while (m.find()) {
        sb.replace(m.start(), m.end(),
                upperFirstLetter(m.group()));
    }
    System.out.println(input + " -> " + sb);
}

prints

#he&llo wo!r^ld -> #He&Llo Wo!R^Ld
0hel`lo-w0rld -> 0Hel`lo-W0rld

Since words may start with digits, and the requirement was to convert the first letter (not character) to upper case. This method finds the first letter, converts it to upper case and returns the new string. So 01_hello would become 01_Hello

    
public static String upperFirstLetter(String word) {
    char[] chs = word.toCharArray();
    for (int i = 0; i < chs.length; i++) {
        if (Character.isLetter(chs[i])) {
            chs[i] = Character.toUpperCase(chs[i]);
            break;
        }
    }
    return String.valueOf(chs);
}


来源:https://stackoverflow.com/questions/65569551/capitalize-first-letters-in-words-in-the-string-with-different-separators-using

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!