Parsing a string that represents a chemical reaction and verify if the reaction is possible

前端 未结 3 1160
独厮守ぢ
独厮守ぢ 2021-01-13 01:55

I have to write a program that takes a user\'s chemical equation as an input, like 12 CO2 + 6 H2O -> 2 C6H12O6 + 12 O2, and watch if the amount of Atoms is on both sites the

3条回答
  •  天命终不由人
    2021-01-13 02:24

    This question is asking for a simple parser for a simple type of equation. I am assuming that you do not need to support all kinds of irregular equations with parentheses and weird symbols.

    Just to be safe, I would use a lot of String.split() instead of regexes.

    A (relatively) simple solution would do the following:

    1. Split on ->
    2. Make sure there are two pieces
    3. Sum up each piece:
      1. Split on +
      2. Parse each molecule and sum up the atoms:
        1. Parse optional multiplier
        2. Find all matches to molecule regex
        3. Convert the numbers and add them up by element
    4. Compare the results

    Each level of parsing can be handily done in a separate method. Using regex is probably the best way to parse the individual molecules, so I borrowed the expression from here: https://codereview.stackexchange.com/questions/2345/simplify-splitting-a-string-into-alpha-and-numeric-parts. The regex is pretty much trivial, so please bear with me:

    import java.util.Map;
    import java.util.HashMap;
    import java.util.regex.Pattern;
    import java.util.regex.Matcher;
    
    public class SimpleChemicalEquationParser
    {
        // Counts of elements on each side
        private Map left;
        private Map right;
    
        public SimpleChemicalEquationParser(String eqn)
        {
            this.left = new HashMap<>();
            this.right = new HashMap<>();
            parse(eqn);
        }
    
        public boolean isBalanced()
        {
            return left.equals(right);
        }
    
        public boolean isSimpleBalanced()
        {
            return leftCount() == rightCount();
        }
    
        public int leftCount()
        {
            return left.values().stream().mapToInt(Integer::intValue).sum();
        }
    
        public int rightCount()
        {
            return right.values().stream().mapToInt(Integer::intValue).sum();
        }
    
        private void parse(String eqn)
        {
            String[] sides = eqn.split("->");
            if(sides.length != 2) {
                throw new RuntimeException("Check your equation. There should be exactly one -> symbol somewhere");
            }
            parseSide(sides[0], this.left);
            parseSide(sides[1], this.right);
        }
    
        private void parseSide(String side, Map counter)
        {
            String[] molecules = side.split("\\+");
            for(String molecule : molecules) {
                parseMolecule(molecule, counter);
            }
        }
    
        private void parseMolecule(String molecule, Map counter)
        {
            molecule = molecule.trim();
            Matcher matcher = Pattern.compile("([a-zA-Z]+)\\s*([0-9]*)").matcher(molecule);
            int multiplier = 1;
            int endIndex = 0;
            while(matcher.find()) {
                String separator = molecule.substring(endIndex, matcher.start()).trim();
                if(!separator.isEmpty()) {
                    // Check if there is a premultiplier before the first element
                    if(endIndex == 0) {
                        String multiplierString = molecule.substring(0, matcher.start()).trim();
                        try {
                            multiplier = Integer.parseInt(multiplierString);
                        } catch(NumberFormatException nfe) {
                            throw new RuntimeException("Invalid prefix \"" + multiplierString +
                                                       "\" to molecule \"" + molecule.substring(matcher.start()) + "\"");
                        }
                    } else {
                        throw new RuntimeException("Nonsensical characters \"" + separator +
                                                   "\" in molecule \"" + molecule + "\"");
                    }
                }
                parseElement(multiplier, matcher.group(1), matcher.group(2), counter);
                endIndex = matcher.end();
            }
            if(endIndex != molecule.length()) {
                throw new RuntimeException("Invalid end to side: \"" + molecule.substring(endIndex) + "\"");
            }
        }
    
        private void parseElement(int multiplier, String element, String atoms, Map counter)
        {
            if(!atoms.isEmpty())
                multiplier *= Integer.parseInt(atoms);
            if(counter.containsKey(element))
                multiplier += counter.get(element);
            counter.put(element, multiplier);
        }
    
        public static void main(String[] args)
        {
            // Collect all command line arguments into one equation
            StringBuilder sb = new StringBuilder();
            for(String arg : args)
                sb.append(arg).append(' ');
    
            String eqn = sb.toString();
            SimpleChemicalEquationParser parser = new SimpleChemicalEquationParser(eqn);
            boolean simpleBalanced = parser.isSimpleBalanced();
            boolean balanced = parser.isBalanced();
    
            System.out.println("Left: " + parser.leftCount());
            for(Map.Entry entry : parser.left.entrySet()) {
                System.out.println("    " + entry.getKey() + ": " + entry.getValue());
            }
            System.out.println();
    
            System.out.println("Right: " + parser.rightCount());
            for(Map.Entry entry : parser.right.entrySet()) {
                System.out.println("    " + entry.getKey() + ": " + entry.getValue());
            }
            System.out.println();
    
            System.out.println("Atom counts match: " + simpleBalanced);
            System.out.println("Elements match: " + balanced);
        }
    }
    

    All the work is done by the parse method and it's subordinates, which make a sort of virtual call tree. Since this approach makes it especially easy to make sure that the atoms of each element are actually balanced out, I have gone ahead and done that here. This class prints the counts of the atoms on each side of the equation, whether or not the raw counts balance out, as well as whether or not they match my element type. Here are a couple of example runs:

    OP's original example:

    $ java -cp . SimpleChemicalEquationParser '12 C O2 + 6 H2O -> 2 C6H12O6 + 12 O2'
    Left: 54
        C: 12
        H: 12
        O: 30
    
    Right: 72
        C: 12
        H: 24
        O: 36
    
    Atom counts match: false
    Elements match: false
    

    Added Ozone to make the number of atoms match up

    $ java -cp . SimpleChemicalEquationParser '12 C O2 + 6 H2O + 6 O3 -> 2 C6H12O6 + 12 O2'
    Left: 72
        C: 12
        H: 12
        O: 48
    
    Right: 72
        C: 12
        H: 24
        O: 36
    
    Atom counts match: true
    Elements match: false 
    

    Added water to make everything match up

    $ java -cp . SimpleChemicalEquationParser '12 C O2 + 12 H2O -> 2 C6H12O6 + 12 O2'
    Left: 72
        C: 12
        H: 24
        O: 36
    
    Right: 72
        C: 12
        H: 24
        O: 36
    
    Atom counts match: true
    Elements match: true
    

    Notice that I added a space between C and O in CO2. This is because my current regex for molecules, ([a-zA-Z]+)\\s*([0-9]*), allows any combination of letters to represent an element. If your elements are always going to be simple one-letter elements, change this to ([a-zA-Z])\\s*([0-9]*) (remove the + quantifier). If they are going to be properly named, two letter combinations with the second letter always lowercase, do this instead: ([A-Z][a-z]?)\\s*([0-9]*). I recommend the latter option. For both modified versions, the space in C O2 will no longer be necessary.

提交回复
热议问题