xml format in stanford pos tagger

﹥>﹥吖頭↗ 提交于 2019-12-25 17:00:30

问题


i have tagged 20 sentences and this is my code:

public class myTag {

public static void main(String[] args) {

    Properties props = new Properties();

    try {
        props.load(new FileReader("D:/tagger/english-bidirectional-distsim.tagger.props"));
    } catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

    MaxentTagger tagger = new MaxentTagger("D:/tagger/english-bidirectional-distsim.tagger",props);

    //==================================================================================================
    try (BufferedReader br = new BufferedReader(new FileReader("C:/Users/chelsea/Desktop/EN/EN.txt")))
    {

        String sCurrentLine;

        while ((sCurrentLine = br.readLine()) != null) {

            String tagged = tagger.tagString(sCurrentLine);
            System.out.println(tagged);
        }

    } catch (IOException e) {
        e.printStackTrace();
    }

}

}

this is the output:

as you can see in sentence node it has a Id attribute and here it's constantly=0 which it should not be.i expect the value=0,1,2,3,4,... i don't understand what is wrong with my code.


回答1:


Stanford POS tagger (strictly speaking, sentence splitter that is applied before POS annotator) generates ids for sentences per input text. So, you ask tagger to tag sCurrentLine consisting of one sentence, this text is split into sentences - actually, just one, with id = 0; then you ask to tag another text - sCurrentLine from the next iteration - and it again is the only sentence and thereby it is the first sentence with id = 0; and so on.

Thus, if you want correct ids, firstly create the whole text, then pass it to tagger. However, if your input text is already split by sentences, it'll be better to leave things as they are (and generate ids by yourself in the loop, if you need them).



来源:https://stackoverflow.com/questions/29443556/xml-format-in-stanford-pos-tagger

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!