RegEx for MetaMap in Java

删除回忆录丶 提交于 2019-12-12 03:09:55

问题


MetaMap files have following lines:

mappings([map(-1000,[ev(-1000,'C0018017','Objective','Goals',[objective],[inpr],[[[1,1],[1,1],0]],yes,no)])]).

The format is explained as

mappings(
      [map(negated overall score for this mapping, 
            [ev(negated candidate score,'UMLS concept ID','UMLS concept','preferred name for concept - may or may not be different',
                 [matched word or words lowercased that this candidate matches in the phrase - comma separated list],
                 [semantic type(s) - comma separated list],
                 [match map list - see below],candidate involved with head of phrase - yes or no,
                 is this an overmatch - yes or no
               )
            ]
          )
      ]
    ).

I want to run a RegEx query in java that gives me the Strings 'UMLS concept ID', semantic type and match map list. Is RegEx the right tool or what is the most efficent way to accomplish this in Java?


回答1:


Here's my attempt for a regex solution. This replace "meta-regexing" methodology is something I'm experimenting with; I hope it reads to a more readable code.

String line = "mappings([map(-1000,[ev(-1000,'C0018017','Objective','Goals',[objective],[inpr],[[[1,1],[1,1],0]],yes,no)])]).";
String regex = 
    "mappings([map(number,[ev(number,<quoted>,quoted,quoted,[csv],[<csv>],[<matchmap>],yesno,yesno)])])."
    .replaceAll("([\\.\\(\\)\\[\\]])", "\\\\$1") // escape metacharacters
    .replace("<", "(").replace(">", ")") // set up capture groups
    .replace("number", "-?\\d+")
    .replace("quoted", "'[^']*'")
    .replace("yesno", "(?:yes|no)")
    .replace("csv", "[^\\]]*")
    .replace("matchmap", ".*?")
;
System.out.println(regex);
// prints "mappings\(\[map\(-?\d+,\[ev\(-?\d+,('[^']*'),'[^']*','[^']*',\[[^\]]*\],\[([^\]]*)\],\[(.*?)\],(?:yes|no),(?:yes|no)\)\]\)\]\)\."

Matcher m = Pattern.compile(regex).matcher(line);
if (m.find()) {
    System.out.println(m.group(1)); // prints "'C0018017'"
    System.out.println(m.group(2)); // prints "inpr"
    System.out.println(m.group(3)); // prints "[[1,1],[1,1],0]"
}

This replace meta-regexing allows you to accomodate whitespaces between symbols easily by just setting the appropriate replace (instead of sprinkling it all into one unreadable mess).




回答2:


That's a truly hairy format. Regex sounds like the way to go, but you're going to have a truly hairy regex:

mappings\(\[map\(-?[0-9.]+,\[ev\(-?[0-9.]+,'(.*?)','.*?','.*?',\[.*?\],\[(.*?)\],\[(.*)\],(?:yes|no),(?:yes|no)\)\]\)\]\)\.

It gets worse when you have to express the regex as a Java String -- as always, you'll replace every \ with \\. But this should get you what you want; matching groups 1, 2, and 3 are the Strings that you wanted to pull out. Note that I haven't rigorously tested it against malformed input because I haven't the stomach for it. :)

For educational purposes: Despite its appearance, this wasn't actually hard to construct at all -- I just took your sample line and replaced the actual values with the appropriate wildcards, making sure to escape out the parens and brackets and the dot at the end.




回答3:


It's possible, yes.

Something like (assuming that the values you've quoted are the only places quotes are legal, that the values you've added [] to are the only places those are legal, that '[' and ']' characters can't be present inside values, that the match map list can't have ]] in it apart from at the end. You get the picture -- lots of assumptions . . .)

^[^']+?'([^']*+)'[^\[]+\[[^]]+\],\[([^\]]*?)\],\[\[(.*?)\]\].*$

Which should give you those three fields as the three matched groups (tested on your example with http://www.regexplanet.com/simple/index.html)

Which is-

"^[^']+?'([^']*+)'[^\\[]+\\[[^]]+\\],\\[([^\\]]*?)\\],\\[\\[(.*?)\\]\\].*$"

as a Java string . . .

But that isn't very maintainable. Would probably be better to be a bit more verbose with this one!



来源:https://stackoverflow.com/questions/2728910/regex-for-metamap-in-java

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!