问题
I am attempting to split a string into a list of strings delimited on the change of if a character can be casted numerically or not. To say it another way, I want to break my string up into distinct groups of numbers and letters. For added fun, I'm also trying to trim any leading 0's from each group of numbers. Consider the following example.
Say you're given "aoeu01234stnh0987"
as your input. The output I want is ["aoeu", "1234", "stnh", "987"]
I made a working example below, but it is somewhat long and confusing. It seems like there must be a better, more concise way to achieve this.
private static List<String> fragmentString(String string) {
char[] charArr = string.toCharArray();
StringBuilder tempStr = new StringBuilder();
StringBuilder tempInt = new StringBuilder();
List<String> tempList = new ArrayList<>();
boolean wasPrevNum = false;
for (char c : charArr) {
boolean isNum = Character.isDigit(c);
if (isNum) {
tempInt.append(c);
if (!wasPrevNum) {
wasPrevNum = true;
tempList.add(tempStr.toString());
tempStr = new StringBuilder();
}
} else {
tempStr.append(c);
if(wasPrevNum) {
while (tempInt.charAt(0) == '0') tempInt.deleteCharAt(0);
tempList.add(tempInt.toString());
tempInt = new StringBuilder();
wasPrevNum = false;
}
}
}
if (tempInt.length() > 0) while (tempInt.charAt(0) == '0') tempInt.deleteCharAt(0);
tempList.add(wasPrevNum ? tempInt.toString() : tempStr.toString());
return tempList;
}
I saw this post about using the split()
method, but that solution only works for their very specific case and doesn't apply here. The split()
method was the first thing I played with to solve this, but I couldn't figure out a regex, and now I'm questioning if this is even possible using split()
.
回答1:
A very simple solution can be using regex. The regex, \p{L}+|[0-9]+
, which means sequence of letters or sequence of digits, can be used to find the substrings. Then, try to parse the found substring. If it is an integer, the leading zeros will be removed as a result of parsing and if the parsing fails, simply print the substring.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Main {
public static void main(String[] args) {
String str = "aoeu01234stnh0987";
Matcher matcher = Pattern.compile("\\p{L}+|[0-9]+").matcher(str);
while (matcher.find()) {
String substring = matcher.group();
try {
System.out.println(Integer.parseInt(substring));
} catch (NumberFormatException e) {
System.out.println(substring);
}
}
}
}
Output:
aoeu
1234
stnh
987
回答2:
This example isn't much more concise than the code the OP posted. The best I can say is that I'm not using an exception as a part of my processing.
Here are the results from one test run.
aoeu01234stnh0987
[aoeu, 1234, stnh, 987]
Here's the complete runnable example code.
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public class StringSplitter {
public static void main(String[] args) {
StringSplitter ss = new StringSplitter();
String input = "aoeu01234stnh0987";
System.out.println(input);
List<String> output = ss.splitString(input);
String[] output2 = output.toArray(new String[output.size()]);
System.out.println(Arrays.toString(output2));
}
public List<String> splitString(String input) {
List<String> output = new ArrayList<>();
if (input == null || input.length() < 1) {
return output;
}
char c = input.charAt(0);
boolean isDigit = Character.isDigit(c);
StringBuilder builder = new StringBuilder();
builder.append(c);
for (int i = 1; i < input.length(); i++) {
c = input.charAt(i);
if (isDigit == Character.isDigit(c)) {
builder.append(c);
} else {
addToList(output, builder, isDigit);
builder.delete(0, builder.length());
builder.append(c);
isDigit = !isDigit;
}
}
addToList(output, builder, isDigit);
return output;
}
private void addToList(List<String> output,
StringBuilder builder, boolean isDigit) {
if (isDigit) {
output.add(Integer.toString(
Integer.valueOf(builder.toString())));
} else {
output.add(builder.toString());
}
}
}
回答3:
You can add some delimiter characters to each group of symbols, and then split the string around these characters:
String str = "aoeu01234stnh0987";
String[] arr = str.replaceAll("\\d+|\\D+", "$0::::").split("::::", 0);
System.out.println(Arrays.toString(arr)); // [aoeu, 01234, stnh, 0987]
// trim leading zeros from numbers,
// i.e. parse the integer value
// and return it back to the string
IntStream.range(0, arr.length)
.filter(i -> arr[i].replaceAll("\\d+", "").length() == 0)
.forEach(i -> arr[i] = Integer.valueOf(arr[i]).toString());
System.out.println(Arrays.toString(arr)); // [aoeu, 1234, stnh, 987]
See also: How to split the string into string and integer in java?
回答4:
I'm posting the code I ended up using in production, just in case it benefits anyone; I know there are already some great answers, and I used information from some answers here to come up with this.
private static List<List<String>> fragmentArr(String[] inputArr) {
List<List<String>> fragArr = new ArrayList<>();
Arrays.stream(inputArr).forEach(string -> {
List<String> listToAdd = new ArrayList<>();
Matcher matcher = Pattern.compile("[^0-9]+|[0-9]+").matcher(string);
while (matcher.find()) {
StringBuilder substring = new StringBuilder(matcher.group());
while (substring.charAt(0) == '0') substring.deleteCharAt(0);
listToAdd.add(substring.toString());
}
fragArr.add(listToAdd);
});
return fragArr;
}
I used a while loop to trim 0's instead of converting to int and converting back to a string for 2 reasons.
Time Complexity - If you convert data types for this problem, even using a
Big Integer
or by some other means, you're costing yourself efficiency. Refer to this post on the time complexity of converting to an int and back. BothparseInt
andtoString
are O(n) operations where n is the entire length of the string. My while loop implementation is O(n) where n is the number of leading 0's.Number Format Exception - If you are passed a string like
"0000000000000000000001000000000000000000000"
, an exception would be thrown if you try to convert the value to an integer to trim the leading 0's because this value is too large for the integer data type in Java. So this is an edge case to consider.
Here's a unit test.
@Test
public void fragmentTest() {
assertEquals(
Arrays.asList(
Arrays.asList("abc", "123", "dce", "456"),
Arrays.asList("123", "abcde", "444", "a")
),
fragmentArr(new String[]{"abc123dce456", "123abcde444a"})
);
assertEquals(
Arrays.asList(
Arrays.asList("abc", "1000000000000000000000", "def", "29")
),
fragmentArr(new String[]{"abc0000000000000000000001000000000000000000000def29"})
);
}
来源:https://stackoverflow.com/questions/65262914/how-to-split-a-string-delimited-on-if-substring-can-be-casted-as-an-int