So I have, for example, a string such as this C3H20IO
What I wanna do is split this string so I get the following:
Array1 = {C,H,I,O}
Ar
Is this good? (Not using split
)
Regex Demo
String line = "C3H20ZnO2ABCD";
String pattern = "([A-Z][a-z]*)(((?=[A-Z][a-z]*|$))|\\d+)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
while (m.find( )) {
System.out.print(m.group(1));
if (m.group(2).length() == 0) {
System.out.println(" 1");
} else {
System.out.println(" " + m.group(2));
}
}
IDEONE DEMO
You could try this approach:
String formula = "C3H20IO";
//insert "1" in atom-atom boundry
formula = formula.replaceAll("(?<=[A-Z])(?=[A-Z])|(?<=[a-z])(?=[A-Z])|(?<=\\D)$", "1");
//split at letter-digit or digit-letter boundry
String regex = "(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)";
String[] atoms = formula.split(regex);
Output:
atoms: [C, 3, H, 20, I, 1, O, 1]
Now all even even indices (0, 2, 4...) are atoms and odd ones are the associated number:
String[] a = new String[ atoms.length/2 ];
int[] n = new int[ atoms.length/2 ];
for(int i = 0 ; i < a.length ; i++) {
a[i] = atoms[i*2];
n[i] = Integer.parseInt(atoms[i*2+1]);
}
Output:
a: [C, H, I, O]
n: [3, 20, 1, 1]
I suggest splitting by uppercase letter using zero-width lookahead regex (to extract items like C12
, O2
, Si
), then split each item into element and its numeric weight:
List<String> elements = new ArrayList<>();
List<Integer> weights = new ArrayList<>();
String[] items = "C6H12Si6OH".split("(?=[A-Z])"); // [C6, H12, Si6, O, H]
for (String item : items) {
String[] pair = item.split("(?=[0-9])", 2); // e.g. H12 => [H, 12], O => [O]
elements.add(pair[0]);
weights.add(pair.length > 1 ? Integer.parseInt(pair[1]) : 1);
}
System.out.println(elements); // [C, H, Si, O, H]
System.out.println(weights); // [6, 12, 6, 1, 1]
You can use a regular expression to slide over your input using the Matcher.find() method.
Here a rough example of what it may look like:
String input = "C3H20IO";
List<String> array1 = new ArrayList<String>();
List<Integer> array2 = new ArrayList<Integer>();
Pattern pattern = Pattern.compile("([A-Z][a-z]*)([0-9]*)");
Matcher matcher = pattern.matcher(input);
while(matcher.find()){
array1.add(matcher.group(1));
String atomAmount = matcher.group(2);
int atomAmountInt = 1;
if((atomAmount != null) && (!atomAmount.isEmpty())){
atomAmountInt = Integer.valueOf(atomAmount);
}
array2.add(atomAmountInt);
}
I know, the conversion from List to Array is missing, but it should give you an idea of how to approach your problem.
You can split the string by using a regular expression like (?<=\D)(?=\d). Try this :
String alphanum= "abcd1234";
String[] part = alphanum.split("(?<=\\D)(?=\\d)");
System.out.println(part[0]);
System.out.println(part[1]);
will output
abcd 1234
This works assuming each element starts with a capital letter, i.e. if you have "Fe" you don't represent it in String as "FE". Basically, you split the string on each capital letter then split each new string by letters and numbers, adding "1" if the new split contains no numbers.
String s = "C3H20IO";
List<String> letters = new ArrayList<>();
List<String> numbers = new ArrayList<>();
String[] arr = s.split("(?=\\p{Upper})"); // [C3, H20, I, O]
for (String str : arr) { //[C, 3]:[H, 20]:[I]:[O]
String[] temp = str.split("(?=\\d)", 2);
letters.add(temp[0]);
if (temp.length == 1) {
numbers.add("1");
} else {
numbers.add(temp[1]);
}
}
System.out.println(Arrays.asList(letters)); //[[C, H, I, O]]
System.out.println(Arrays.asList(numbers)); //[[3, 20, 1, 1]]