Problems with a shunting yard algorithm

蓝咒 提交于 2019-11-28 10:21:19

It sounds like you're doing a lex-then-parse style parser, where you're going to need a simple state machine in the lexer in order to get separate tokens for unary and binary minus. (In a PEG parser, this isn't something you have to worry about.)

In JavaCC, you would have a DEFAULT state, where you would consider the - character to be UNARY_MINUS. When you tokenized the end of a primary expression (either a closing paren, or an integer, based on the examples you gave), then you would switch to the INFIX state where - would be considered to be INFIX_MINUS. Once you encountered any infix operator, you would return to the DEFAULT state.

If you're rolling your own, it might be a bit simpler than that. Look at this Python code for a clever way of doing it. Basically, when you encounter a -, you just check to see if the previous token was an infix operator. That example uses the string "-u" to represent the unary minus token, which is convenient for an informal tokenization. Best I can tell, the Python example does fail to handle case where a - follows an open paren, or comes at the beginning of the input. Those should be considered unary as well.

In order for unary minus to be handled correctly in the shunting-yard algorithm itself, it needs to have higher precedence than any of the infix operators, and it needs to marked as right-associative. (Make sure you handle right-associativity. You may have left it out since the rest of your operators are left-associative.) This is clear enough in the Python code (although I would use some kind of struct rather than two separate maps).

When it comes time to evaluate, you will need to handle unary operators a little differently, since you only need to pop one number off the stack, rather than two. Depending on what your implementation looks like, it may be easier to just go through the list and replace every occurrence of "-u" with [-1, "*"].

If you can follow Python at all, you should be able to see everything I'm talking about in the example I linked to. I find the code to be a bit easier to read than the C version that someone else mentioned. Also, if you're curious, I did a little write-up a while back about using shunting-yard in Ruby, but I handled unary operators as a separate nonterminal, so they are not shown.

Don Roby

The answers to this question might be helpful.

In particular, one of those answers references a solution in C that handles unary minus.

Basically, you have to recognize a unary minus based on the appearance of the minus sign in positions where a binary operator can't be, and make a different token for it, as it has different precedence.

Dijkstra's original paper doesn't too clearly explain how he dealt with this, but the unary minus was listed as a separate operator.

In your lexer, you can implement this pseudo-logic:

if (symbol == '-') {
    if (previousToken is a number 
     OR previousToken is an identifier 
     OR previousToken is a function) {
        currentToken = SUBTRACT;
    } else {
        currentToken = NEGATION;
    }
}

You can set up negation to have a precedence higher than multiply and divide, but lower than exponentiation. You can also set it up to be right associative (just like '^'). Then you just need to integrate the precedence and associativity into the algorithm as described on Wikipedia's page.

If the token is an operator, o1, then: while there is an operator token, o2, at the top of the stack, and either o1 is left-associative and its precedence is less than or equal to that of o2, or o1 has precedence less than that of o2, pop o2 off the stack, onto the output queue; push o1 onto the stack.

I ended up implementing this corresponding code:

} else if (nextToken instanceof Operator) {
    final Operator o1 = (Operator) nextToken;

    while (!stack.isEmpty() && stack.peek() instanceof Operator) {
        final Operator o2 = (Operator) stack.peek();

        if ((o1.associativity == Associativity.LEFT && o1.precedence <= o2.precedence)
         || (o1.associativity == Associativity.RIGHT && o1.precedence < o2.precedence)) {
            popStackTopToOutput();
        } else {
            break;
        }
    }

    stack.push(nextToken);
}

Austin Taylor is quite right that you only need to pop off one number for a unary operator:

if (token is operator negate) {
    operand = pop;
    push operand * -1;
}

Example project:

https://github.com/Digipom/Calculator-for-Android/

Further reading:

http://en.wikipedia.org/wiki/Shunting-yard_algorithm

http://sankuru.biz/blog/1-parsing-object-oriented-expressions-with-dijkstras-shunting-yard-algorithm

I know it's an old post, but may be someone will find it useful . I implemented this algorithm before, starting by toknizer using StreamTokenizer class and it works fine. In StreamTokenizer in Java, there are some character with specific meaning. For example: ( is an operator, sin is a word,... For your question, There is a method called "streamToknizer.ordinaryChar(..)" which it specifies that the character argument is "ordinary" in this tokenizer. It removes any special significance the character has as a comment character, word component, string delimiter, white space, or number character. Source here

So you can define - as ordinary character which means, it won't be considered as a sign for number.For example, if you have expression 2-3 , You will have [2,-,3], but if you didn't specify it as ordinary, so it will be [2,-3]

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!