Export vectors from fastText to spaCy

邮差的信 提交于 2021-01-28 07:51:31

问题


I downloaded the fasttext.cc vectors of 1.5gb, I used example code spaCy examples vectors_fast_text. I executed the following command in the terminal:

python config/vectors_fast_text.py vectors_loc data/vectors/wiki.pt.vec

After a few minutes with the processor at 100%, I received the following text:

class colspan 0.32231358

What happens from here? How can I export these vectors elsewhere, such as for example with my AWS S3 training templates?


回答1:


I modified the example script, to load the existing data of my language, read the file word2vec and at the end write all the content in a folder (this folder needs to exist).

Follow vectors_fast_text.py:

[LANGUAGE] = example: "pt"

[FILE_WORD2VEC] = "./data/word2vec.txt"

from __future__ import unicode_literals
import plac
import numpy

import spacy
from spacy.language import Language


@plac.annotations()
def main():
    nlp = spacy.load('[LANGUAGE]')
    with open("[FILE_WORD2VEC]", 'rb') as file_:
        header = file_.readline()
        nr_row, nr_dim = header.split()
        nlp.vocab.reset_vectors(width=int(nr_dim))
        count = 0
        for line in file_:
            count += 1
            line = line.rstrip().decode('utf8')
            pieces = line.rsplit(' ', int(nr_dim))
            word = pieces[0]
            print("{} - {}".format(count, word))
            vector = numpy.asarray([float(v) for v in pieces[1:]], dtype='f')
            nlp.vocab.set_vector(word, vector)  # add the vectors to the vocab
    nlp.to_disk("./models/new_nlp/")


if __name__ == '__main__':
    plac.call(main)

Type in the terminal:

python vectors_fast_text.py

It will take about 10 minutes to finish, depending on the size of the word2vec file. In the script I made the print of the word, so that you can follow.

After that, you must type in the terminal:

python -m spacy package ./models/new_nlp/ ./my_models/
python setup.py sdist

And then you will have a "zip" file.

pip install /path/to/pt_example_model-1.0.0.tar.gz

A detailed tutorial can be found on the spaCy website: https://spacy.io/usage/training



来源:https://stackoverflow.com/questions/49663387/export-vectors-from-fasttext-to-spacy

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!