I want to use Stanford Parser to create a .conll file for further processing. So far I managed to parse the test sentence with the command:
stanford-parser-full-2013-06-20/lexparser.sh stanford-parser-full-2013-06-20/data/testsent.txt > output.txt
Instead of a txt file I would like to have a file in .conll. I'm pretty sure it is possible, at it is mentioned in the documentation (see here). Can I somehow modify my command or will I have to write Javacode?
Thanks for help!
If you're looking for dependencies printed out in CoNLL X (CoNLL 2006) format, try this from the command line:
java -mx150m -cp "stanford-parser-full-2013-06-20/*:" edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat "penn" edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz stanford-parser-full-2013-06-20/data/testsent.txt >testsent.tree
java -mx150m -cp "stanford-parser-full-2013-06-20/*:" edu.stanford.nlp.trees.EnglishGrammaticalStructure -treeFile testsent.tree -conllx
Here's the output for the first test sentence:
1 Scores _ NNS NNS _ 4 nsubj _ _
2 of _ IN IN _ 0 erased _ _
3 properties _ NNS NNS _ 1 prep_of _ _
4 are _ VBP VBP _ 0 root _ _
5 under _ IN IN _ 0 erased _ _
6 extreme _ JJ JJ _ 8 amod _ _
7 fire _ NN NN _ 8 nn _ _
8 threat _ NN NN _ 4 prep_under _ _
9 as _ IN IN _ 13 mark _ _
10 a _ DT DT _ 12 det _ _
11 huge _ JJ JJ _ 12 amod _ _
12 blaze _ NN NN _ 15 xsubj _ _
13 continues _ VBZ VBZ _ 4 advcl _ _
14 to _ TO TO _ 15 aux _ _
15 advance _ VB VB _ 13 xcomp _ _
16 through _ IN IN _ 0 erased _ _
17 Sydney _ NNP NNP _ 20 poss _ _
18 's _ POS POS _ 0 erased _ _
19 north-western _ JJ JJ _ 20 amod _ _
20 suburbs _ NNS NNS _ 15 prep_through _ _
21 . _ . . _ 4 punct _ _
I'm not sure you can do this through command line, but this is a java version:
for (List<HasWord> sentence : new DocumentPreprocessor(new StringReader(filename))) {
Tree parse = lp.apply(sentence);
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
GrammaticalStructure.printDependencies(gs, gs.typedDependencies(), parse, true, false);
}
来源:https://stackoverflow.com/questions/17450652/create-conll-file-as-output-of-stanford-parser