use polyglot package for Named Entity Recognition in hebrew

为君一笑 提交于 2019-12-09 19:24:35

问题


I am trying to use the polyglot package for Named Entity Recognition in hebrew.
this is my code:

# -*- coding: utf8 -*-
import polyglot
from polyglot.text import Text, Word
from polyglot.downloader import downloader
downloader.download("embeddings2.iw")
text = Text(u"in france and in germany")
print(type(text))
text2 = Text(u"נסעתי מירושלים לתל אביב")
print(type(text2))
print(text.entities)
print(text2.entities)

this is the output:

<class 'polyglot.text.Text'>
<class 'polyglot.text.Text'>
[I-LOC([u'france']), I-LOC([u'germany'])]
Traceback (most recent call last):
  File "C:/Python27/Lib/site-packages/IPython/core/pyglot.py", line 15, in <module>
    print(text2.entities)
  File "C:\Python27\lib\site-packages\polyglot\decorators.py", line 20, in __get__
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "C:\Python27\lib\site-packages\polyglot\text.py", line 132, in entities
    for i, (w, tag) in enumerate(self.ne_chunker.annotate(self.words)):
  File "C:\Python27\lib\site-packages\polyglot\decorators.py", line 20, in __get__
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "C:\Python27\lib\site-packages\polyglot\text.py", line 100, in ne_chunker
    return get_ner_tagger(lang=self.language.code)
  File "C:\Python27\lib\site-packages\polyglot\decorators.py", line 30, in memoizer
    cache[key] = obj(*args, **kwargs)
  File "C:\Python27\lib\site-packages\polyglot\tag\base.py", line 191, in get_ner_tagger
    return NEChunker(lang=lang)
  File "C:\Python27\lib\site-packages\polyglot\tag\base.py", line 104, in __init__
    super(NEChunker, self).__init__(lang=lang)
  File "C:\Python27\lib\site-packages\polyglot\tag\base.py", line 40, in __init__
    self.predictor = self._load_network()
  File "C:\Python27\lib\site-packages\polyglot\tag\base.py", line 109, in _load_network
    self.embeddings = load_embeddings(self.lang, type='cw', normalize=True)
  File "C:\Python27\lib\site-packages\polyglot\decorators.py", line 30, in memoizer
    cache[key] = obj(*args, **kwargs)
  File "C:\Python27\lib\site-packages\polyglot\load.py", line 61, in load_embeddings
    p = locate_resource(src_dir, lang)
  File "C:\Python27\lib\site-packages\polyglot\load.py", line 43, in locate_resource
    if downloader.status(package_id) != downloader.INSTALLED:
  File "C:\Python27\lib\site-packages\polyglot\downloader.py", line 738, in status
    info = self._info_or_id(info_or_id)
  File "C:\Python27\lib\site-packages\polyglot\downloader.py", line 508, in _info_or_id
    return self.info(info_or_id)
  File "C:\Python27\lib\site-packages\polyglot\downloader.py", line 934, in info
    raise ValueError('Package %r not found in index' % id)
ValueError: Package u'embeddings2.iw' not found in index

The english worked but not the hebrew.
Whether I try to download the package u'embeddings2.iw' or not I get:

ValueError: Package u'embeddings2.iw' not found in index

回答1:


I got it!
It seems like a bug to me.
The language detection defined the language as 'iw' which is the The former ISO 639 language code for Hebrew, and was changed to 'he'. The text.entities did not recognize the iw code, so i changes it like so:

text2.hint_language_code = 'he'


来源:https://stackoverflow.com/questions/38296602/use-polyglot-package-for-named-entity-recognition-in-hebrew

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!