Parsing FIX protocol in regex?

↘锁芯ラ 提交于 2019-12-05 09:12:01

Use a regex tool like expresso or regexbuddy.
Why don't you split on ^A and then match ([^=])+=(.*) for each one putting them into a hash? You could also filter with a switch that by default won't add the tags you're uninterested in and that has a fall through for all the tags you are interested in.

Phil Cooper

No need to split on "\x01" then regex then filter. If you wanted just tags 34,49 and 56 (MsgSeqNum, SenderCompId and TargetCompId) you could regex:

dict(re.findall("(?:^|\x01)(34|49|56)=(.*?)\x01", raw_msg))

Simple regexes like this will work if you know your sender does not have embedded data that could cause a bug in any simple regex. Specifically:

  1. No Raw Data fields (actually combination of data len and raw data like RawDataLength,RawData (95/96) or XmlDataLen, XmlData (212,213)
  2. No encoded fields for unicode strings like EncodedTextLen, EncodedText (354/355)

To handle those cases takes a lot of additional parsing. I use a custom python parser but even the fixlib code you referenced above gets these cases wrong. But if your data is clear of these exceptions the regex above should return a nice dict of your desired fields.

Edit: I've left the above regex as-is but it should be revised so that the final match element be (?=\x01). The explanation can be found in @tropleee's answer here.

^A is actually \x{01}, thats just how it shows up in vim. In perl, I had done this via a split on hex 1 and then a split on "=", at the second split, value [0] of the array is the Tag and value [1] is the Value.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!