The Story:
When a list of strings is defined on multiple lines, it is often easy to forget a comma between list items, like in this example
I implemented code based on @Jim's post. May it works in all situations:
import tokenize
from io import BytesIO
def my_checker(pycode):
"""
tokenizes python code and yields
start, end, strline of any position where
a scenario like this happens (missing string seperator):
[..., "a string" "derp", ...]
"""
IDLE = 0
WAITING_STRING = 1
CHECKING_SEPARATOR = 2
tokenizer = tokenize.tokenize(BytesIO(pycode.encode('utf-8')).readline)
state = IDLE
for toknum, tokval, start, end, strcode in tokenizer:
if state == IDLE:
if toknum == tokenize.OP and tokval == '[':
state = WAITING_STRING
elif state == WAITING_STRING:
if toknum == tokenize.STRING:
state = CHECKING_SEPARATOR
elif toknum == tokenize.OP and tokval == [']']:
state = IDLE
elif state == CHECKING_SEPARATOR:
if toknum == tokenize.STRING:
yield (start, end, strcode)
elif toknum == tokenize.OP and tokval in ['+', ',']:
state = WAITING_STRING
elif toknum == tokenize.OP and tokval == ']':
state = IDLE
my_code = """
foo = "derp"
def derp(a,x):
return str('dingdong'+str(a*x))
[
"derp"+"FOO22" , "FOO", "donk" "slurp",0, 0
]
class extreme_logical_class():
STATIC_BAD_LIST = [0,
"BLA,",
"FOO"
"derp"
]
def __init__(self):
self._in_method_check = ["A" "B"]
nested_list = [
['DERP','FOO'],
[0,'hello', 'peter' 'pan'],
['this', 'is', ['ultra', 'mega'
'nested']]
]
"""
for error in my_checker(my_code):
print('missing , in list at: line {}@{} to line {}@{}: "{}"'.format(
error[0][0],error[0][1],error[1][0],error[1][1], error[2].strip()
))
The result is:
keksnicoh@localhost ~ % python3 find_bad_lists.py
missing , in list at: line 6@36 to line 6@43: ""derp"+"FOO22" , "FOO", "donk" "blurp",0 0"
missing , in list at: line 13@8 to line 13@14: ""derp""
missing , in list at: line 16@37 to line 16@40: "self._in_method_check = ["A" "B"]"
missing , in list at: line 20@24 to line 20@29: "[0,'hello', 'peter' 'pan'],"
missing , in list at: line 22@8 to line 22@16: "'nested']]"
In real life I would prefer to avoid doing such mistakes; there are good IDE's like Sublime Text which allow you to edit and format lists with multi cursor. If you get used to those concepts these sort of "separation" errors won't happen in your code.
Of course if one has a Team of Developers one could integrate such a tool into the testing environment.