问题
I am using Stanford CoreNLP (01.2016 version) and I would like to keep the punctuation in the dependency relations. I have found some ways for doing that when you run it from command line, but I didn't find anything regarding the java code which extracts the dependency relations.
Here is my current code. It works, but no punctuation is included:
Annotation document = new Annotation(text);
        Properties props = new Properties();
        props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse");
        props.setProperty("ssplit.newlineIsSentenceBreak", "always");
        props.setProperty("ssplit.eolonly", "true");
        props.setProperty("pos.model", modelPath1);
        props.put("parse.model", modelPath );
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        pipeline.annotate(document);
        LexicalizedParser lp = LexicalizedParser.loadModel(modelPath + lexparserNameEn,
                "-maxLength", "200", "-retainTmpSubcategories");
        TreebankLanguagePack tlp = new PennTreebankLanguagePack();
        GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
        List<CoreMap> sentences = document.get(SentencesAnnotation.class);
        for (CoreMap sentence : sentences) {
            List<CoreLabel> words = sentence.get(CoreAnnotations.TokensAnnotation.class);               
            Tree parse = lp.apply(words);
            GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
            Collection<TypedDependency> td = gs.typedDependencies();
            parsedText += td.toString() + "\n";
Any kind of dependency relation is OK for me, basic, typed, collapsed, etc. I just want to include the punctuation marks.
Thanks in advance,
回答1:
You are doing quite a bit of extra work here as you are running the parser once through CoreNLP and then again by calling lp.apply(words).
The easiest way of getting a dependency tree/graph with punctuation marks is by using the CoreNLP option parse.keepPunct as following.
Annotation document = new Annotation(text);
Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, parse");
props.setProperty("ssplit.newlineIsSentenceBreak", "always");
props.setProperty("ssplit.eolonly", "true");
props.setProperty("pos.model", modelPath1);
props.setProperty("parse.model", modelPath);
props.setProperty("parse.keepPunct", "true");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
pipeline.annotate(document);
for (CoreMap sentence : sentences) {
   //Pick whichever representation you want
   SemanticGraph basicDeps = sentence.get(SemanticGraphCoreAnnotations.BasicDependenciesAnnotation.class);
   SemanticGraph collapsed = sentence.get(SemanticGraphCoreAnnotations.CollapsedDependenciesAnnotation.class);
   SemanticGraph ccProcessed = sentence.get(SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation.class);
}
The sentence annotation object stores the dependency trees/graphs as a SemanticGraph. If you want a list of TypedDependency objects, use the method typedDependencies(). For example,
List<TypedDependency> dependencies = basicDeps.typedDependencies();
来源:https://stackoverflow.com/questions/37130722/how-to-keep-punctuation-in-stanford-dependency-parser