问题
I have a string consisting of alternating strings of lower-cased letters and numbers (integers or floats), which is of arbitrary length, and I wish to split it into parts, each of maximal possible size, such that a part will consist of either a string or a (string representing a) number.
I don't need to regard special forms of numbers, such as exponents, hexadecimal, etc.; just simple floating point or an integer.
A few examples:
>>> split("")
()
>>> split("p")
('p',)
>>> split("2")
('2',)
>>> split("a2b3")
('a', '2', 'b', '3')
>>> split("a2.1b3")
('a', '2.1', 'b', '3')
>>> split("a.1b3")
('a', '.1', 'b', '3')
However, the following calls should raise some error:
>>> split(3)
>>> split("a0.10.2")
>>> split("ab.c")
My first attempt was using re.split. However, this attempt is quite naive, and it does not save the delimiters, in case I make these letters:
>>> re.split("[a-z]", "a.1b3")
['', '.1', '3']
My second attempt was using itertools.groupby. The problem is that it does not care about the form of the number, so, for example:
>>> islowalpha = labmda s: str.isalpha(s) and str.islower(s)
>>> [''.join(g) for _, g in itertools.groupby("a0.10.2b", islowalpha)] # should raise
['a', '0.10.2', 'b']
Note: I don't care about the form of the output, as long as it is iterable.
Note: I've read this, but I could not adapt the solution to my problem. The main difference is that I need to allow only acceptable numbers, and not a simple list of digits and points.
回答1:
import re
def split_gen(x):
for f, s in re.findall(r'([\d.]+)|([^\d.]+)', x):
if f:
float(f)
yield f
else:
yield s
def split(x):
'''
>>> split("")
()
>>> split("p")
('p',)
>>> split("2")
('2',)
>>> split("a2b3")
('a', '2', 'b', '3')
>>> split("a2.1b3")
('a', '2.1', 'b', '3')
>>> split("a.1b3")
('a', '.1', 'b', '3')
>>> split(3)
Traceback (most recent call last):
...
TypeError: expected string or buffer
>>> split("a0.10.2")
Traceback (most recent call last):
...
ValueError: could not convert string to float: '0.10.2'
>>> split("ab.c")
Traceback (most recent call last):
...
ValueError: could not convert string to float: '.'
'''
return tuple(split_gen(x))
if __name__ == '__main__':
import doctest
doctest.testmod()
回答2:
A bit of play with re.sub and itertools.cycle:
def split(s):
res = []
def replace(matchobj):
res.append(matchobj.group(0))
return ''
letter = re.compile('^([a-z]+)')
number = re.compile('^(\.\d|\d+\.\d+|\d+)')
if letter.match(s):
c = itertools.cycle([letter, number])
else:
c = itertools.cycle([number, letter])
for op in c:
mods = op.sub(replace, s)
if len(s) == len(mods):
return
elif not mods:
return res
s = mods
The basic idea - create two alternating re patterns and try to match the input string with them.
A demo with some of your examples:
>>> split("2")
['2']
>>> split("a2b3")
['a', '2', 'b', '3']
>>> split("a.1b3")
['a', '.1', 'b', '3']
>>> split("a0.10.2")
>>> split("ab.c")
回答3:
The problem is that the premise of the your question is plausible. How can you distinguish floats from an arbitrary string? There are a lot of ways to interpret. For example,
0.10.2
This can mean 0.1, 0.2. or 0, .10, .2
what if the number is
27.6734.98?
You need to specify what kind of number and what format it will be first. Ex: every number only has one digit beside decimals.
回答4:
import re
string = 'a.2b3c4.5d'
REG_STR = r'([a-zA-Z])|(\.\d+)|(\d+\.\d+)|(\d+)'
matches = [m.group() for m in re.finditer(REG_STR, string) if re.finditer(REG_STR, string)]
来源:https://stackoverflow.com/questions/22863430/split-a-string-consisting-of-letters-and-numbers-into-parts