spaCy: errors attempting to load serialized Doc

…衆ロ難τιáo~ 提交于 2019-12-13 03:44:13

问题


I am trying to serialize/deserialize spaCy documents (setup is Windows 7, Anaconda) and am getting errors. I haven't been able to find any explanations. Here is a snippet of code and the error it generates:

import spacy
nlp = spacy.load('en')
text = 'This is a test.'
doc = nlp(text)
fout = 'test.spacy' # <-- according to the API for Doc.to_disk(), this needs to be a directory (but for me, spaCy writes a file)
doc.to_disk(fout)
doc.from_disk(fout)
Traceback (most recent call last):

  File "<ipython-input-7-aa22bf1b9689>", line 1, in <module>
    doc.from_disk(fout)

  File "doc.pyx", line 763, in spacy.tokens.doc.Doc.from_disk

  File "doc.pyx", line 806, in spacy.tokens.doc.Doc.from_bytes

ValueError: [E033] Cannot load into non-empty Doc of length 5.

I have also tried creating a new Doc object and loading from that, as shown in the example ("Example: Saving and loading a document") in the spaCy docs, which results in a different error:

from spacy.tokens import Doc
from spacy.vocab import Vocab

new_doc = Doc(Vocab()).from_disk(fout)
Traceback (most recent call last):

  File "<ipython-input-16-4d99a1199f43>", line 1, in <module>
    Doc(Vocab()).from_disk(fout)

  File "doc.pyx", line 763, in spacy.tokens.doc.Doc.from_disk

  File "doc.pyx", line 838, in spacy.tokens.doc.Doc.from_bytes

  File "stringsource", line 646, in View.MemoryView.memoryview_cwrapper

  File "stringsource", line 347, in View.MemoryView.memoryview.__cinit__

ValueError: buffer source array is read-only

EDIT:

As pointed out in the replies, the path provided should be a directory. However, the first code snippet creates a file. Changing this to a non-existing directory path doesn't help as spaCy still creates a file. Attempting to write to an existing directory causes an error too:

fout = 'data'

doc.to_disk(fout) Traceback (most recent call last):

  File "<ipython-input-8-6c30638f4750>", line 1, in <module>
    doc.to_disk(fout)

  File "doc.pyx", line 749, in spacy.tokens.doc.Doc.to_disk

  File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1161, in open
    opener=self._opener)

  File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1015, in _opener
    return self._accessor.open(self, flags, mode)

  File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 387, in wrapped
    return strfunc(str(pathobj), *args)

PermissionError: [Errno 13] Permission denied: 'data'

Python has no problem writing at this location via standard file operations (open/read/write).

Trying with a Path object yields the same results:

from pathlib import Path

import os

fout = Path(os.path.join(os.getcwd(), 'data'))

doc.to_disk(fout)
Traceback (most recent call last):

  File "<ipython-input-17-6c30638f4750>", line 1, in <module>
    doc.to_disk(fout)

  File "doc.pyx", line 749, in spacy.tokens.doc.Doc.to_disk

  File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1161, in open
    opener=self._opener)

  File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1015, in _opener
    return self._accessor.open(self, flags, mode)

  File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 387, in wrapped
    return strfunc(str(pathobj), *args)

PermissionError: [Errno 13] Permission denied: 'C:\\Users\\Username\\workspace\\data'

Any ideas why this might be happening?


回答1:


doc.to_disk(fout)

must be

a path to a directory, which will be created if it doesn't exist. Paths may be either strings or Path-like objects.

as the documentation for spaCy states in https://spacy.io/api/doc

Try changing fout to a directory, it might do the trick.

EDIT: Examples from the spacy documentation:

for doc.to_disk:

doc.to_disk('/path/to/doc')

and for doc.from_disk:

from spacy.tokens import Doc
from spacy.vocab import Vocab
doc = Doc(Vocab()).from_disk('/path/to/doc')


来源:https://stackoverflow.com/questions/51711084/spacy-errors-attempting-to-load-serialized-doc

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!