Regex for ^ | in C#

两盒软妹~` 提交于 2019-12-23 06:11:45

问题


I am working on HL7 messages and I need a regex. This doesn't work:

HL7 message=MSH|^~\&|DATACAPTOR|123|123|20100816171948|ORU^R01|081617194802900|P|2.3|8859/1

My regex is:

MSH|^~\&|DATACAPTOR|\d{3}|\d{3}|(\d{4}\d{2}\d{2}\d{2}\d{2}\d{2})|ORU\\^R01|\d{20}|P|2.3|8859/1

Can anybody suggest a regex for special characters? I am using this code:

strRegex = "\\vMSH|^~\\&|DATACAPTOR|\\d{3}|\\d{3}|
(\\d{4}\\d{2}\\d{2}\\d{2}\\d{2}\\d{2})|ORU\\^R01|\\d{20}|P|2.3|8859/1";
Regex rx = new Regex(strRegex, RegexOptions.Compiled | RegexOptions.IgnoreCase );

回答1:


|, ^, and \ are all special characters in regular expressions, so you'd have to escape them with \. Remember \ is also an escape character within a regular string literal so you'd have to escape that, too:

var strRegex = "\\vMSH\\|\\^~\\\\&\\|DATACAPTOR\\|…

But it's generally a lot easier to use a verbatim string literal (@"…"):

var strRegex = @"\vMSH\|\^~\\&\|DATACAPTOR\|…

Finally, note that (\d{4}\d{2}\d{2}\d{2}\d{2}\d{2}) can be simplified to (\d{14}).

However, for a structure like this, it's probably easier to just use the Split method.

var segment = "MSH|^~\&|DATACAPTOR…";
var fields = segment.Split('|');
var timestamp = fields[5];

Warning: HL7 messages may use different control characters—starting the 4th character in the MSH segment as a field separator (in this case |^~\& are the control characters). It's best to parse the control characters first if you don't control your input and these control characters may change.




回答2:


For me your question describes two distinct problems.

Problem 1) "..I need a regex..this doesn't work..My regex is..anybody suggest a (better) regex..?"

This is the good part of your question.

As already pointed out by @p-s-w-g some special characters in regular expressions must be escaped. Page Microsoft Developer Network: Character Escapes in Regular Expressions tells you which characters are special and how to escape them.

In order to easily test if your regex recognizes the grammar you may find useful some interactive regex testing tools, e.g. Regex Hero or The Regulator

Problem 2) "I am working on HL7 messages..this doesn't work..My regex is..anybody suggest a (better) regex..?"

This is the bad part of your question.

The

MSH|^~\&|DATACAPTOR|123|123|20100816171948|ORU^R01|081617194802900|P|2.3|8859/1

example shown in your question is already not valid HL7 message fragment. It is something similar to HL7 but it is was already damaged probably by some text pre-processing code. HL7 v2 messages are not transmitted using text protocol that can be manipulated using text tools. The protocol is binary but at the same time partially readable and thus controllable by humans without any special tools. But it is binary protocol and must be processed as such. Regex is a tool for working with text strings not binary strings. And although it may seem possible to outsmart some ancient 20 years old protocol by a new-age regex one-liner, it is not good approach. I have tried to explain the why not in the comment part of your question.

Basic decoding of the fragment is:

MSH-0: MSH
MSH-1: |
MSH-2: ^~\&
MSH-3: DATACAPTOR
MSH-4: 123
MSH-5: 123
MSH-6: ! missing !
MSH-7: 20100816171948
MSH-8: ! missing !
MSH-9: ORU^R01
MSH-10: 081617194802900
MSH-11: P
MSH-12: 2.3
MSH-13: ! missing !
MSH-14: ! missing !
MSH-15: ! missing !
MSH-16: ! missing !
MSH-17: ! missing !
MSH-18: 8859/1

The ! missing ! pieces are really missing. In normal MSH segment they should be there at their corresponding positions, just having default empty value.

By reading Health Level Seven, Version 2.3.1 © 1999 - Chapter 2.24.1 MSH - message header segment we can see that

The message was created 4 years ago in 2010, probably by Capsule Tech, Inc.'s DataCaptor™ and formatted by rules defined by Health Level Seven, Version 2.3© 1997 that is by 17 years old and several times updated standard and was supposed to be used by one of the countries listed in Wikipedia: ISO/IEC 8859-1

From your question I can't see more, but whatever you are trying to do and whatever data you are going to process for whatever reason, the code fragment you are starting with is already wrong, in general the HL7 regex parsing approach is strange and if you're working on a serious software to be used anywhere in the healthcare industry, please consider writing or using a serious and tested parser, e.g. the one used by NHapi library http://sourceforge.net/p/nhapi/code/HEAD/tree/NHapi20/NHapi.Base/Parser/PipeParser.cs



来源:https://stackoverflow.com/questions/25436552/regex-for-in-c-sharp

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!