Using stanford parser to parse Chinese

眉间皱痕 提交于 2020-01-06 08:14:21

问题


here is my code, mostly from the demo. The program runs perfectly, but the result is very wrong. It did not spilt the words. Thank you

public static void main(String[] args) {
 LexicalizedParser lp = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/xinhuaFactored.ser.gz");

  demoAPI(lp);

}


public static void demoAPI(LexicalizedParser lp) {


// This option shows loading and using an explicit tokenizer
String sent2 = "我爱你";
TokenizerFactory<CoreLabel> tokenizerFactory =
    PTBTokenizer.factory(new CoreLabelTokenFactory(), "");
Tokenizer<CoreLabel> tok =
    tokenizerFactory.getTokenizer(new StringReader(sent2));
List<CoreLabel> rawWords2 = tok.tokenize();

Tree parse = lp.apply(rawWords2);

TreebankLanguagePack tlp = new PennTreebankLanguagePack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
List<TypedDependency> tdl = gs.typedDependenciesCCprocessed();
System.out.println(tdl);
System.out.println();

// You can also use a TreePrint object to print trees and dependencies
TreePrint tp = new TreePrint("penn,typedDependenciesCollapsed");
tp.printTree(parse);
}

回答1:


Did you make sure to segment the words? For example try running it again with "我 爱 你." as the sentence. I believe from the command line the parser will segment automatically, however I'm not sure what it does from within Java.



来源:https://stackoverflow.com/questions/22703614/using-stanford-parser-to-parse-chinese

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!