Python AST from ANTLR Parse Tree?

末鹿安然 提交于 2019-12-04 09:53:35

The following could be a start:

public class AST {

    private final Object payload;

    private final List<AST> children;

    public AST(ParseTree tree) {
        this(null, tree);
    }

    private AST(AST ast, ParseTree tree) {
        this(ast, tree, new ArrayList<AST>());
    }

    private AST(AST parent, ParseTree tree, List<AST> children) {

        this.payload = getPayload(tree);
        this.children = children;

        if (parent == null) {
            walk(tree, this);
        }
        else {
            parent.children.add(this);
        }
    }

    public Object getPayload() {
        return payload;
    }

    public List<AST> getChildren() {
        return new ArrayList<>(children);
    }

    private Object getPayload(ParseTree tree) {
        if (tree.getChildCount() == 0) {
            return tree.getPayload();
        }
        else {
            String ruleName = tree.getClass().getSimpleName().replace("Context", "");
            return Character.toLowerCase(ruleName.charAt(0)) + ruleName.substring(1);
        }
    }

    private static void walk(ParseTree tree, AST ast) {

        if (tree.getChildCount() == 0) {
            new AST(ast, tree);
        }
        else if (tree.getChildCount() == 1) {
            walk(tree.getChild(0), ast);
        }
        else if (tree.getChildCount() > 1) {

            for (int i = 0; i < tree.getChildCount(); i++) {

                AST temp = new AST(ast, tree.getChild(i));

                if (!(temp.payload instanceof Token)) {
                    walk(tree.getChild(i), temp);
                }
            }
        }
    }

    @Override
    public String toString() {

        StringBuilder builder = new StringBuilder();

        AST ast = this;
        List<AST> firstStack = new ArrayList<>();
        firstStack.add(ast);

        List<List<AST>> childListStack = new ArrayList<>();
        childListStack.add(firstStack);

        while (!childListStack.isEmpty()) {

            List<AST> childStack = childListStack.get(childListStack.size() - 1);

            if (childStack.isEmpty()) {
                childListStack.remove(childListStack.size() - 1);
            }
            else {
                ast = childStack.remove(0);
                String caption;

                if (ast.payload instanceof Token) {
                    Token token = (Token) ast.payload;
                    caption = String.format("TOKEN[type: %s, text: %s]",
                            token.getType(), token.getText().replace("\n", "\\n"));
                }
                else {
                    caption = String.valueOf(ast.payload);
                }

                String indent = "";

                for (int i = 0; i < childListStack.size() - 1; i++) {
                    indent += (childListStack.get(i).size() > 0) ? "|  " : "   ";
                }

                builder.append(indent)
                        .append(childStack.isEmpty() ? "'- " : "|- ")
                        .append(caption)
                        .append("\n");

                if (ast.children.size() > 0) {
                    List<AST> children = new ArrayList<>();
                    for (int i = 0; i < ast.children.size(); i++) {
                        children.add(ast.children.get(i));
                    }
                    childListStack.add(children);
                }
            }
        }

        return builder.toString();
    }
}

and can be used to create an AST for the input "f(arg1='1')\n" as follows:

public static void main(String[] args) {

    Python3Lexer lexer = new Python3Lexer(new ANTLRInputStream("f(arg1='1')\n"));
    Python3Parser parser = new Python3Parser(new CommonTokenStream(lexer));

    ParseTree tree = parser.file_input();
    AST ast = new AST(tree);

    System.out.println(ast);
}

which would print:

'- file_input
   |- stmt
   |  |- small_stmt
   |  |  |- atom
   |  |  |  '- TOKEN[type: 35, text: f]
   |  |  '- trailer
   |  |     |- TOKEN[type: 47, text: (]
   |  |     |- arglist
   |  |     |  |- test
   |  |     |  |  '- TOKEN[type: 35, text: arg1]
   |  |     |  |- TOKEN[type: 53, text: =]
   |  |     |  '- test
   |  |     |     '- TOKEN[type: 36, text: '1']
   |  |     '- TOKEN[type: 48, text: )]
   |  '- TOKEN[type: 34, text: \n]
   '- TOKEN[type: -1, text: ]

I realize this still contains nodes you might not want, but you could even add a set of token types you'd like to exclude. Feel free to hack away!

Here is a Gist containing a version of the code above with the proper import statements and some JavaDocs and inline comments.

The Eclipse DLTK project Python subproject implements a custom Python AST model in Java. It is built from from an AntlrV3 ast, but should not be too difficult to refit to build from an AntlrV4 parse tree.

The Eclipse PyDev project presumably also implements a Java-based AST for python source. Note, the layout of the source tree in both projects should be quite similar.

Naturally, you should check the licenses before using code from these sources, just to be sure.

I found a workaround:

Use Jython and ast (thanks @delnan for leading me there). Or, do everything you need directly in Python code, and just spit out the results back to Java.

PythonInterpreter interpreter = new PythonInterpreter();
interpreter.exec("import ast");
PyObject o = interpreter.eval(
    "ast.dump(ast.parse('f(arg1=\\'1\\')', 'filename', 'eval'))" + "\n");
System.out.print(o.toString());

Output is

Expression(body=Call(func=Name(id='f', ctx=Load()), args=[], keywords=[keyword(arg='arg1', value=Str(s='1'))], starargs=None, kwargs=None))

This doesn't strictly answer the question, and might not be applicable for all users, so I'm leaving this answer unselected.

ANTLR4 can generate a visitor, which you can use to traverse the parse tree and to construct an AST. Python has an ast package, so this should not be a problem (if you're using Python).

I have written a toy Python interpreter in Python 3 using ANTLR4 (as a part of my study). Visitor code is located in /tinypy/AST/builder/, so you can get an idea of how it's done.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!