How to get the location names(only) using Named Entity Recognition?

a 夏天 提交于 2019-12-13 05:36:00

问题


I am working on a project where I need to extract the locations in a given text file. I tried the Named Entity Recognition example given here. The code snippet of this is given below. But here it outputs all the three entities; names, locations, and organizations. Is there any solution to extract only the locations using python?

 import nltk

def extract_entity_names(t):
    entity_names = []

    if hasattr(t, 'label') and t.label:
        if t.label() == 'NE':
            entity_names.append(' '.join([child[0] for child in t]))
        else:
            for child in t:
                entity_names.extend(extract_entity_names(child))

    return entity_names

with open('sample.txt', 'r') as f:
    for line in f:
        sentences = nltk.sent_tokenize(line)
        tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
        tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
        chunked_sentences = nltk.ne_chunk_sents(tagged_sentences, binary=True)

        entities = []
        for tree in chunked_sentences:
            entities.extend(extract_entity_names(tree))

        print(entities)

回答1:


You will need to train a Named Entity Recognition (NER) to do that. The NLTK toolkit will give you parts of the speech, not the type of noun it is

If you're looking for a quicker solution. I would recommend the geotext package

from geotext import GeoText
sentence = "my foreigner New York Canberra Sydney Australia, Japan, Fujimoto Godfather Avatar"
places = GeoText(sentence)
print places.countries
print places.cities


来源:https://stackoverflow.com/questions/48427348/how-to-get-the-location-namesonly-using-named-entity-recognition

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!