How would I go about Implementing A Simple Stack-Based Programming Language

问题

I am interested in extending my knowledge of computer programming by implementing a stack-based programming language. I am seeking out advice on where to begin, as I intend for it to have functions like "pushint 1" which would push an integer with value 1 on to the top of the stack and flow-control via labels like "L01: jump L01:".

So far I have made a C# implementation of what I want my language to act like (wanted to link to it but IDEOne is blocked), but it is very messy and needs optimization. It translates the input to XML and then parses it. My goals are to go to a lower level language, (perhaps C/C++) but my issues are implementing a stack that can hold various data types and does not have a fixed size.

Eventually I would also like to implement arrays and functions. In addition, I think that I need to have a better Lexer and I am wondering if a parsing-tree would be a good idea for such a simplistic language.

Any advice/criticism is welcome, and please consider that I am still reasonably new to programming (I have just recently completed AP CompSci I). Also, links to open-source stack-based languages are welcome.

Here is a basic program that I would like to try and interpret/compile (where [this is a comment]):

[Hello World!]
pushchar    '\n'
pushstring  "Hello World!"
print
[Count to 5 and then count down!]
pushint     1
setlocal    0
L01:
pushchar    '\n'
getlocal    0
print           [print x + '\n']
getlocal    0
increment
setlocal    0   [x = x + 1]
pushint     5
getlocal    0
lessthan        [x < 5]
iftrue      L01
L02:
pushchar    '\n'
getlocal    0
print           [print x + '\n']
getlocal    0
decrement
setlocal    0   [x = x - 1]
pushint     0
getlocal    0
greaterthan     [x > 0]
iftrue      L02

The expected output would be:

Hello World!
1
2
3
4
5
4
3
2
1

回答1:

A stack based language such as Factor has the following syntax:

2 3 + 5 - print

This is equivalent to the following C style code:

print(2 + 3 - 5);

The advantage of using a stack based language is that it's simple to implement. In addition if the language uses reverse polish notation, as most stack based languages do, then all you need for the front end of your language is a lexer. You don't need to parse the tokens into a syntax tree as there's only one way to decode the stream of tokens.

What you're trying to create is not a stack based programming language, but a stack based virtual machine. Application virtual machines can be either stack based or register based. For example, the Java Virtual Machine is stack based. It executes Java bytecode (which is what you're creating - bytecode for a virtual machine). However the programming languages that compile to this bytecode (e.g. Java, Erlang, Groovy, etc.) are not stack based.

What you're trying to create is like the assembly level language of your own virtual machine, which happens to be stack based. That being said it'll be fairly easy to do so - stack based virtual machines are easier to implement that register based virtual machines. Again, all you need is a lexer such as flex. Here's a small example in JavaScript using a library called lexer:

var program = "[print(2 + 3)]";
program += "\n push 2";
program += "\n push 3";
program += "\n add";
program += "\n print";

lexer.setInput(program);

var token;
var stack = [];
var push = false;

while (token = lexer.lex()) {
    switch (token) {
    case "NUMBER":
        if (push) stack.push(lexer.yytext);
        else alert("Unexpected number.");
        break;
    case "ADD":
        if (push) alert("Expected number.");
        else stack.push(stack.pop() + stack.pop());
        break;
    case "PRINT":
        if (push) alert("Expected number.");
        else alert(stack.pop());
        break;
    }

    push = token === "PUSH";
}

<script src="https://rawgit.com/aaditmshah/lexer/master/lexer.js"></script>
<script>
var lexer = new Lexer;

lexer.addRule(/\s+/, function () {
    // matched whitespace - discard it
});

lexer.addRule(/\[.*\]/, function () {
    // matched a comment - discard it
});

lexer.addRule(/\d+/, function (lexeme) {
    this.yytext = parseInt(lexeme);
    return "NUMBER";
});

lexer.addRule(/push/, function () {
    return "PUSH";
});

lexer.addRule(/add/, function () {
    return "ADD";
});

lexer.addRule(/print/, function () {
    return "PRINT";
});
</script>

It's really simple. You can fiddle with the program and modify it to your needs. Best of luck.

回答2:

I think you will find a paper on "MetaII" really enlightening. It shows how to define a pushdown stack compiler machine and an compiler for it, in 10 short but mind-bending pages. See this answer: https://stackoverflow.com/a/1005680/120163 Once you understand this, writing pushdown stack interpreters will forever be easy.

来源：https://stackoverflow.com/questions/13466600/how-would-i-go-about-implementing-a-simple-stack-based-programming-language

标签

parsing

lexer

stack-based