How to extract noun phrases from the parsed text

前提是你 提交于 2019-12-04 11:59:10

While writing your own tree parser is a good exercise (!), if you just want results, the easiest way is to use more of the functionality of the Stanford NLP tools, namely Tregex, which is designed for just such things. You can change your final while loop to something like this:

TregexPattern tPattern = TregexPattern.compile("NP");
while ((line = br.readLine()) != null) {
    Tree t = Tree.valueOf(line);
    TregexMatcher tMatcher = tPattern.matcher(t);
    while (tMatcher.find()) {
      System.out.println(tMatcher.getMatch());
    }
}

Here you go. I changed it around a little bit and it got messy but I can clean it up if you really need the code pretty.

import java.io.*;
import java.util.*;

public class nounPhrase {
    public static void main(String[] args)throws IOException{

        ArrayList<String> npList = new ArrayList<String>();
        String line = "";
        String line1 = "";

        String Input = "/local/Input/Temp/Temp.txt";
        String Output = "/local/Output/Temp/Temp-out.txt";

        FileInputStream  fis = new FileInputStream (Input);
        BufferedReader br = new BufferedReader(new InputStreamReader(fis,"UTF-8"));

        while ((line = br.readLine()) != null){
            char[] lineArray = line.toCharArray();
            int temp;
            for (int i=0; i+2<lineArray.length; i++){
                if(lineArray[i]=='(' && lineArray[i+1]=='N' && lineArray[i+2]=='P'){
                    temp = i;
                    while(lineArray[i] != ')'){
                        i++;
                    }
                    i+=2;
                    line1 = line.substring(temp,i);
                    npList.add(line1);
                }
            }
            npList.add("*");
        }

        for (int i=0; i<npList.size(); i++){
            if(!(npList.get(i).equals("*"))){
                System.out.print(npList.get(i));
                if(i<npList.size()-1 && npList.get(i+1).equals("*")){
                    System.out.println();
                }
            }
        }
    }
} 

and fyi the main reason your code only selected the first occurrence of the NP is because you used the indexOf method to find the location. IndexOf ALWAYS and ONLY takes the first occurrence of a String you are searching for.

You have to iterate on parse tree and change index for Noun Phrase after getting first NP Phrase, simple approach can be just substring your line variable and start index of this substring will be np+1. Following are the changes you can make into your code:

while ((line = br.readLine())!= null){
        char[] lineArray = line.toCharArray();
        int indexOfNP = line.indexOf("(NP");
        while(indexOfNP!=-1) {
            np = findClosingParen(lineArray, indexOfNP);
            line1 = line.substring(indexOfNP, np + 1);
            System.out.print(line1 + "\n");
            npList.add(line1);
            line = line.substring(np+1);
            indexOfNP = line.indexOf("(NP");
            lineArray = line.toCharArray();
        }
}

For Recursive Solution:

public static void main(String[] args) throws IOException {

    ArrayList<String> npList = new ArrayList<String>();
    String line;
    String Input = "/local/Input/Temp/Temp.txt";
    String Output = "/local/Output/Temp/Temp-out.txt";

    FileInputStream fis = new FileInputStream (Input);
    BufferedReader br = new BufferedReader(new InputStreamReader(fis,"UTF-8"));
    while ((line = br.readLine())!= null){
        int indexOfNP = line.indexOf("(NP");
        if(indexOfNP>=0)
            extractNPs(npList,line,indexOfNP);
    }

    for(String npString:npList){
        System.out.println(npString);
    }

    br.close();
    fis.close();

}

public static ArrayList<String> extractNPs(ArrayList<String> arr,String  
                                                   parse, int indexOfNP){
    if(indexOfNP==-1){
        return arr;
    }
    else{
        int npIndex = findClosingParen(parse.toCharArray(), indexOfNP);
        String mainNP = new String(parse.substring(indexOfNP, npIndex + 1));
        arr.add(mainNP);
        //Uncomment Lines below if you also want MainNP along with all NPs     
        //within MainNP to be extracted
        /*
        mainNP = new String(mainNP.substring(3));
        if(mainNP.indexOf("(NP")>0){
            return extractNPs(arr,mainNP,mainNP.indexOf("(NP"));
        }
        */
        parse = new String(parse.substring(npIndex+1));
        indexOfNP = parse.indexOf("(NP");
        return extractNPs(arr,parse,indexOfNP);
    }
}

You are building a parser (.. for the code your natural language parser generated), which is a subject with wide academic documentation. The most simple parser you can build is a LL parser. Have a look at this artcle from wikipedia, that has some pretty good examples for you to get inspired: http://en.wikipedia.org/wiki/LL_parser

The wikipedia entry on parsing in general might give you a taste of the field of parsing in general: Wikipedia article : http://en.wikipedia.org/wiki/Parsing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!