Parsing Python function calls to get argument positions

拟墨画扇 提交于 2019-12-04 07:21:51

This code uses a combination of ast (to find the initial argument offsets) and regular expressions (to identify boundaries of the arguments):

import ast
import re

def collect_offsets(call_string):
    def _abs_offset(lineno, col_offset):
        current_lineno = 0
        total = 0
        for line in call_string.splitlines():
            current_lineno += 1
            if current_lineno == lineno:
                return col_offset + total
            total += len(line)
    # parse call_string with ast
    call = ast.parse(call_string).body[0].value
    # collect offsets provided by ast
    offsets = []
    for arg in call.args:
        a = arg
        while isinstance(a, ast.BinOp):
            a = a.left
        offsets.append(_abs_offset(a.lineno, a.col_offset))
    for kw in call.keywords:
        offsets.append(_abs_offset(kw.value.lineno, kw.value.col_offset))
    if call.starargs:
        offsets.append(_abs_offset(call.starargs.lineno, call.starargs.col_offset))
    if call.kwargs:
        offsets.append(_abs_offset(call.kwargs.lineno, call.kwargs.col_offset))
    offsets.append(len(call_string))
    return offsets

def argpos(call_string):
    def _find_start(prev_end, offset):
        s = call_string[prev_end:offset]
        m = re.search('(\(|,)(\s*)(.*?)$', s)
        return prev_end + m.regs[3][0]
    def _find_end(start, next_offset):
        s = call_string[start:next_offset]
        m = re.search('(\s*)$', s[:max(s.rfind(','), s.rfind(')'))])
        return start + m.start()

    offsets = collect_offsets(call_string)   

    result = []
    # previous end
    end = 0
    # given offsets = [9, 14, 21, ...],
    # zip(offsets, offsets[1:]) returns [(9, 14), (14, 21), ...]
    for offset, next_offset in zip(offsets, offsets[1:]):
        #print 'I:', offset, next_offset
        start = _find_start(end, offset)
        end = _find_end(start, next_offset)
        #print 'R:', start, end
        result.append((start, end))
    return result

if __name__ == '__main__':
    try:
        while True:
            call_string = raw_input()
            positions = argpos(call_string)
            for p in positions:
                print ' ' * p[0] + '^' + ((' ' * (p[1] - p[0] - 2) + '^') if p[1] - p[0] > 1 else '')
            print positions
    except EOFError, KeyboardInterrupt:
        pass

Output:

whatever(foo, baz(), 'puppet', 24+2, meow=3, *meowargs, **meowargs)
         ^ ^
              ^   ^
                     ^      ^
                               ^  ^
                                     ^    ^
                                             ^       ^
                                                        ^        ^
[(9, 12), (14, 19), (21, 29), (31, 35), (37, 43), (45, 54), (56, 66)]
f(1, len(document_text) - 1 - position)
  ^
     ^                               ^
[(2, 3), (5, 38)]

You may want to get the abstract syntax tree for a function call of your function.

Here is a python recipe to do so, based on ast module.

Python's ast module is used to parse the code string and create an ast Node. It then walks through the resultant ast.AST node to find the features using a NodeVisitor subclass.

Function explain does the parsing. Here is you analyse your function call, and what you get

>>> explain('mymod.nestmod.func("arg1", "arg2", kw1="kword1", kw2="kword2",
         *args, **kws')
    [Call(  args=['arg1', 'arg2'],keywords={'kw1': 'kword1', 'kw2': 'kword2'},
      starargs='args', func='mymod.nestmod.func', kwargs='kws')]

If I understand correctly, from your example you want something like:

--> arguments("whatever(foo, baz(), 'puppet', 24+2, meow=3, *meowargs, **meowkwds)")
{
  'foo': slice(9, 12),
  'baz()': slice(14, 19),
  '24+2': slice(21, 29),
  'meow=3': slice(32, 38),
  '*meowargs': slice(41, 50),
  '**meowkwds': slice(53, 63),
}

Note that I changed the name of your last argument, as you can't have two arguments with the same name.

If this is what you want then you need to have the original string in question (shouldn't be a problem if your building an IDE), and you need a string parser. A simple state machine should do the trick.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!