问题
I found the second answer of Parsing FIX protocol in regex? to be very nice so I tried it out.
Here is my code.
new_order_finder1 = re.compile("(?:^|\x01)(11|15|55)=(.*?)\x01")
new_order_finder2 = re.compile("(?:^|\x01)(15|55)=(.*?)\x01")
new_order_finder3 = re.compile("(?:^|\x01)(11|15|35|38|54|55)=(.*?)\x01")
if __name__ == "__main__":
line = "20150702-05:36:08.687 : 8=FIX.4.2\x019=209\x0135=D\x0134=739\x0149=PINE\x0152=20150702-05:36:08.687\x0156=CSUS\x011=KI\x0111=N09080243\x0115=USD\x0121=2\x0122=5\x0138=2100\x0140=2\x0144=126\x0148=AAPL.O\x0154=1\x0155=AAPL.O\x0157=DMA\x0158=TEXT\x0160=20150702-05:36:08.687\x01115=Tester\x016061=9\x0110=087\x01"
fields = dict(re.findall(new_order_finder1, line))
print(fields)
fields2 = dict(re.findall(new_order_finder2, line))
print(fields2)
fields3 = dict(re.findall(new_order_finder3, line))
print(fields3)
Here is the output
{'11': 'N09080243', '55': 'AAPL.O'}
{'55': 'AAPL.O', '15': 'USD'}
{'35': 'D', '38': '2100', '11': 'N09080243', '54': '1'}
It looks like some of the fields are not properly matched by regex.
What's the problem here?
回答1:
The problem is due to the \x01
at the end consuming the \x01
separator, which causes the pattern to always fail on the key-value pair adjacent to one just matched, since none of the (?:^|\x01)
can match.
Using this substring of your input as example, matching against new_order_finder3
:
\x0154=1\x0155=AAPL.O\x01
------------
X
As you can see, after it manages to match the key-value pair 54=1
, it also consumes \x01
and the adjacent key-value pair can never be matched.
There are more than one method to resolve this issue. One solution is to place the \x01
at the end in a look-ahead assertion, so that we can make sure that \x01
ends the key-value pair without consuming it:
new_order_finder3 = re.compile("(?:^|\x01)(11|15|35|38|54|55)=(.*?)(?=\x01)")
The output now contains all the expected fields:
{'11': 'N09080243', '38': '2100', '15': 'USD', '55': 'AAPL.O', '54': '1', '35': 'D'}
回答2:
The trailing \x01
is consuming stuff that you wanted to match. The regex matcher will proceed with the next match after the previous thing which matched.
With a lookahead, the fix is easy. Just replace the final \x01
with (?=\x01)
.
import re
new_order_finder3 = re.compile("(?:^|\x01)(11|15|35|38|54|55)=(.*?)(?=\x01)")
if __name__ == "__main__":
line = "20150702-05:36:08.687 : 8=FIX.4.2\x019=209\x0135=D\x0134=739\x01"\
"49=PINE\x0152=20150702-05:36:08.687\x0156=CSUS\x011=KI\x01" \
"11=N09080243\x0115=USD\x0121=2\x0122=5\x0138=2100\x0140=2\x01" \
"44=126\x0148=AAPL.O\x0154=1\x0155=AAPL.O\x0157=DMA\x0158=TEXT\x01" \
"60=20150702-05:36:08.687\x01115=Tester\x016061=9\x0110=087\x01"
fields3 = dict(re.findall(new_order_finder3, line))
print(fields3)
来源:https://stackoverflow.com/questions/31198950/parsing-fix-message-in-regex