Chunking Stanford Named Entity Recognizer (NER) outputs from NLTK format

前端 未结 4 1492
不思量自难忘°
不思量自难忘° 2020-12-15 10:29

I am using NER in NLTK to find persons, locations, and organizations in sentences. I am able to produce the results like this:

[(u\'Remaking\', u\'O\'), (u\'         


        
4条回答
  •  误落风尘
    2020-12-15 11:22

    You can use the standard NLTK way of representing chunks using nltk.Tree. This might mean that you have to change your representation a bit.

    What I usually do is represent NER-tagged sentences as lists of triplets:

    sentence = [('Andrew', 'NNP', 'PERSON'), ('is', 'VBZ', 'O'), ('part', 'NN', 'O'), ('of', 'IN', 'O'), ('the', 'DT', 'O'), ('Republican', 'NNP', 'ORGANIZATION'), ('Party', 'NNP', 'ORGANIZATION'), ('in', 'IN', 'O'), ('Dallas', 'NNP', 'LOCATION')]
    

    I do this when I use an external tool for NER tagging a sentence. Now you can transform this sentence the NLTK representation:

    from nltk import Tree
    
    
    def IOB_to_tree(iob_tagged):
        root = Tree('S', [])
        for token in iob_tagged:
            if token[2] == 'O':
                root.append((token[0], token[1]))
            else:
                try:
                    if root[-1].label() == token[2]:
                        root[-1].append((token[0], token[1]))
                    else:
                        root.append(Tree(token[2], [(token[0], token[1])]))
                except:
                    root.append(Tree(token[2], [(token[0], token[1])]))
    
        return root
    
    
    sentence = [('Andrew', 'NNP', 'PERSON'), ('is', 'VBZ', 'O'), ('part', 'NN', 'O'), ('of', 'IN', 'O'), ('the', 'DT', 'O'), ('Republican', 'NNP', 'ORGANIZATION'), ('Party', 'NNP', 'ORGANIZATION'), ('in', 'IN', 'O'), ('Dallas', 'NNP', 'LOCATION')]
    print IOB_to_tree(sentence)
    

    The change in representation kind of makes sense because you certainly need POS tags for NER tagging.

    The end result should look like:

    (S
      (PERSON Andrew/NNP)
      is/VBZ
      part/NN
      of/IN
      the/DT
      (ORGANIZATION Republican/NNP Party/NNP)
      in/IN
      (LOCATION Dallas/NNP))
    

提交回复
热议问题