I am looking for pythonic way to split a sentence into words, and also store the index information of all the words in a sentence e.g
a = \"This is a sentenc
Here is a method using regular expressions:
>>> import re
>>> a = "This is a sentence"
>>> matches = [(m.group(0), (m.start(), m.end()-1)) for m in re.finditer(r'\S+', a)]
>>> matches
[('This', (0, 3)), ('is', (5, 6)), ('a', (8, 8)), ('sentence', (10, 17))]
>>> b, c = zip(*matches)
>>> b
('This', 'is', 'a', 'sentence')
>>> c
((0, 3), (5, 6), (8, 8), (10, 17))
As a one-liner:
b, c = zip(*[(m.group(0), (m.start(), m.end()-1)) for m in re.finditer(r'\S+', a)])
If you just want the indices:
c = [(m.start(), m.end()-1) for m in re.finditer(r'\S+', a)]