Java - Split String by Number and Letters

馋奶兔 提交于 2019-11-29 01:48:12

You could try this approach:

String formula = "C3H20IO";

//insert "1" in atom-atom boundry 
formula = formula.replaceAll("(?<=[A-Z])(?=[A-Z])|(?<=[a-z])(?=[A-Z])|(?<=\\D)$", "1");

//split at letter-digit or digit-letter boundry
String regex = "(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)";
String[] atoms = formula.split(regex);

Output:

atoms: [C, 3, H, 20, I, 1, O, 1]

Now all even even indices (0, 2, 4...) are atoms and odd ones are the associated number:

String[] a = new String[ atoms.length/2 ];
int[] n = new int[ atoms.length/2 ];

for(int i = 0 ; i < a.length ; i++) {
    a[i] = atoms[i*2];
    n[i] = Integer.parseInt(atoms[i*2+1]);
}

Output:

a: [C, H, I, O]
n: [3, 20, 1, 1]

You can use a regular expression to slide over your input using the Matcher.find() method.

Here a rough example of what it may look like:

    String input = "C3H20IO";

    List<String> array1 = new ArrayList<String>();
    List<Integer> array2 = new ArrayList<Integer>();

    Pattern pattern = Pattern.compile("([A-Z][a-z]*)([0-9]*)");
    Matcher matcher = pattern.matcher(input);               
    while(matcher.find()){
        array1.add(matcher.group(1));

        String atomAmount = matcher.group(2);
        int atomAmountInt = 1;
        if((atomAmount != null) && (!atomAmount.isEmpty())){
            atomAmountInt = Integer.valueOf(atomAmount);
        }
        array2.add(atomAmountInt);
    }

I know, the conversion from List to Array is missing, but it should give you an idea of how to approach your problem.

An approach without REGEX and data stored using ArrayList:

String s = "C3H20IO";

char Chem = '-';
String val = "";
boolean isFisrt = true;
List<Character> chemList = new ArrayList<Character>();
List<Integer> weightList = new ArrayList<Integer>();
for (char c : s.toCharArray()) {
    if (Character.isLetter(c)) {
        if (!isFisrt) {
            chemList.add(Chem);
            weightList.add(Integer.valueOf(val.equals("") ? "1" : val));
            val = "";
        }
        Chem = c;
    } else if (Character.isDigit(c)) {
        val += c;
    } 
    isFisrt = false;
}
chemList.add(Chem);
weightList.add(Integer.valueOf(val.equals("") ? "1" : val));

System.out.println(chemList);
System.out.println(weightList);

OUTPUT:

[C, H, I, O]
[3, 20, 1, 1]

This works assuming each element starts with a capital letter, i.e. if you have "Fe" you don't represent it in String as "FE". Basically, you split the string on each capital letter then split each new string by letters and numbers, adding "1" if the new split contains no numbers.

        String s = "C3H20IO";
        List<String> letters = new ArrayList<>();
        List<String> numbers = new ArrayList<>();

        String[] arr = s.split("(?=\\p{Upper})");  // [C3, H20, I, O]
        for (String str : arr) {  //[C, 3]:[H, 20]:[I]:[O]
            String[] temp = str.split("(?=\\d)", 2);
            letters.add(temp[0]);
            if (temp.length == 1) {
                numbers.add("1");
            } else {
                numbers.add(temp[1]);
            }
        }
        System.out.println(Arrays.asList(letters)); //[[C, H, I, O]]
        System.out.println(Arrays.asList(numbers)); //[[3, 20, 1, 1]]
Thesoham24

make (for loop) with size of input length and add following condition

if(i==number)
// add it to the number array

if(i==character)
//add it into character array

I suggest splitting by uppercase letter using zero-width lookahead regex (to extract items like C12, O2, Si), then split each item into element and its numeric weight:

List<String> elements = new ArrayList<>();
List<Integer> weights = new ArrayList<>();

String[] items = "C6H12Si6OH".split("(?=[A-Z])");  // [C6, H12, Si6, O, H]
for (String item : items) {
    String[] pair = item.split("(?=[0-9])", 2);    // e.g. H12 => [H, 12], O => [O]
    elements.add(pair[0]);
    weights.add(pair.length > 1 ? Integer.parseInt(pair[1]) : 1);
}
System.out.println(elements);  // [C, H, Si, O, H]
System.out.println(weights);   // [6, 12, 6, 1, 1]

Is this good? (Not using split)

Regex Demo

String line = "C3H20ZnO2ABCD";
String pattern = "([A-Z][a-z]*)(((?=[A-Z][a-z]*|$))|\\d+)";

Pattern r = Pattern.compile(pattern);

Matcher m = r.matcher(line);

while (m.find( )) {
     System.out.print(m.group(1));
     if (m.group(2).length() == 0) {
         System.out.println(" 1");
     } else {
         System.out.println(" " + m.group(2));
     }
  }

IDEONE DEMO

You can split the string by using a regular expression like (?<=\D)(?=\d). Try this :

String alphanum= "abcd1234";
String[] part = alphanum.split("(?<=\\D)(?=\\d)");
System.out.println(part[0]);
System.out.println(part[1]);

will output

abcd 1234

I did this as following

ArrayList<Integer> integerCharacters = new ArrayList();
ArrayList<String> stringCharacters = new ArrayList<>();

String value = "C3H20IO"; //Your value 
String[] strSplitted = value.split("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)"); //Split numeric and strings

for(int i=0; i<strSplitted.length; i++){

    if (Character.isLetter(strSplitted[i].charAt(0))){
        stringCharacters.add(strSplitted[i]); //If string then add to strings array
    }
    else{
        integerCharacters.add(Integer.parseInt(strSplitted[i])); //else add to integer array
    }
}

You can use two patterns :

  • [0-9]
  • [a-zA-Z]

Split twice by each of them.

List<String> letters = Arrays.asList(test.split("[0-9]"));
List<String> numbers = Arrays.asList(test.split("[a-zA-Z]"))
            .stream()
            .filter(s -> !s.equals(""))
            .collect(Collectors.toList());

if(letters.size() != numbers.size()){
        numbers.add("1");
    }
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!