Score each sentence in a line based upon a tag and summarize the text. (Java)

孤街浪徒 提交于 2019-12-06 09:51:37

Finding sentence breaks can be a bit more involved than just looking for [.?!], consider using BreakIterator.getSentenceInstance()

Its performance is actually quite similar to LingPipe's (more complex) implementation, and better than the one in OpenNLP (from my own testing, at least).

Sample Code

BreakIterator bi = BreakIterator.getSentenceInstance();
bi.setText(text);
int end, start = bi.first();
while ((end = bi.next()) != BreakIterator.DONE) {
    String sentence = text.substring(start, end);
    start = end;
}

Edit

I think this is what you're looking for:

    Pattern tagFinder = Pattern.compile("/JJ");
    BufferedReader reader = getMyReader();
    String line = null;
    while ((line = reader.readLine()) != null) {
        BreakIterator bi = BreakIterator.getSentenceInstance();
        bi.setText(line);
        int end, start = bi.first();
        while ((end = bi.next()) != BreakIterator.DONE) {
            String sentence = line.substring(start, end);
            String tagged = tagger.tagString(sentence);
            int score = 0;
            Matcher tag = tagFinder.matcher(tagged);
            while (tag.find())
                score++;
            if (score > 1)
                writerForTempFile.println(sentence);
            start = end;
        }
    }

Without understanding it all, my guess would be that your code should more be like this:

    int lastMatch = 0;// Added

    Pattern pattern = Pattern.compile("[.?!]"); //Find new line
    Matcher matcher = pattern.matcher(tagged);
    while(matcher.find())
    {
        Pattern tagFinder = Pattern.compile("/JJ"); // find adjective tag

        // HERE START OF MY CHANGE
        String sentence = tagged.substring(lastMatch, matcher.end());
        lastMatch = matcher.end();
        Matcher tagMatcher = tagFinder.matcher(sentence);
        // HERE END OF MY CHANGE

        while(tagMatcher.find())
        {
            score++; // increase score of sentence for every occurence of adjective tag
        }
        if(score > 1)
            writerForTempFile.write(sentence);
        score = 0;
    }
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!