gensim

Gensim: how to load precomputed word vectors from text file

依然范特西╮ 提交于 2021-02-18 22:17:46
问题 I have a text file with my precomputed word vectors in the following format (example): word -0.0762464299711 0.0128308048976 ... 0.0712385589283\n’ on each line for every word (with 297 extra floats in place of the ... ). I am trying to load these with Gensim as KeyedVectors, because I ultimately would like to compute the cosine similarity, find most similar words, etc. Unfortunately I have not worked with Gensim before and from the documentation it's not quite clear to me how to do this. I

Gensim: how to load precomputed word vectors from text file

倾然丶 夕夏残阳落幕 提交于 2021-02-18 22:17:45
问题 I have a text file with my precomputed word vectors in the following format (example): word -0.0762464299711 0.0128308048976 ... 0.0712385589283\n’ on each line for every word (with 297 extra floats in place of the ... ). I am trying to load these with Gensim as KeyedVectors, because I ultimately would like to compute the cosine similarity, find most similar words, etc. Unfortunately I have not worked with Gensim before and from the documentation it's not quite clear to me how to do this. I

pip install pyemd error?

旧城冷巷雨未停 提交于 2021-02-17 06:17:06
问题 I'm trying to install pyemd package in Python through pip and getting following error: C:\Users\dipanwita.neogy>pip install pyemd Collecting pyemd Using cached pyemd-0.4.3.tar.gz Requirement already satisfied: numpy<2.0.0,>=1.9.0 in c:\users\dipanwita.neogy\a naconda3\lib\site-packages (from pyemd) Building wheels for collected packages: pyemd Running setup.py bdist_wheel for pyemd ... error Complete output from command C:\Users\dipanwita.neogy\Anaconda3\python.exe -u -c "import setuptools,

Does WikiCorpus from gensim library works on Arabic Wikipedia dump?

丶灬走出姿态 提交于 2021-02-11 14:45:22
问题 I see a code which uses Wikicorpus on an Arabic Wikipedia dump, and I know that the process will take a long time to execute, I also searched around about the warning that I get when executing it which says: (UserWarning: detected Windows; aliasing chunkize to chunkize_serial warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")) and answers said that it's ok, nothing serious, it's just a warning. But after waiting about 3 days without any response! I start wondering whether

“ImportError: DLL load failed: The specified module could not be found” when trying to import gensim

风流意气都作罢 提交于 2021-02-11 12:20:19
问题 While trying to import gensim, I run into the following error Traceback (most recent call last): File "c:\Users\usr\Documents\hello\test.py", line 3, in <module> import gensim File "C:\Users\usr\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\gensim\__init__.py", line 5, in <module> from gensim import parsing, corpora, matutils, interfaces, models, similarities, summarization, utils # noqa:F401 File "C:\Users\usr

Word2vec - get rank of similarity

跟風遠走 提交于 2021-02-10 12:58:41
问题 Given I got a word2vec model (by gensim), I want to get the rank similarity between to words. For example, let's say I have the word "desk" and the most similar words to "desk" are: table 0.64 chair 0.61 book 0.59 pencil 0.52 I want to create a function such that: f(desk,book) = 3 Since book is the 3rd most similar word to desk. Does it exists? what is the most efficient way to do this? 回答1: You can use the rank(entity1, entity2) to get the distance - same as the index. model.wv.rank(sample

Word2vec - get rank of similarity

心不动则不痛 提交于 2021-02-10 12:57:05
问题 Given I got a word2vec model (by gensim), I want to get the rank similarity between to words. For example, let's say I have the word "desk" and the most similar words to "desk" are: table 0.64 chair 0.61 book 0.59 pencil 0.52 I want to create a function such that: f(desk,book) = 3 Since book is the 3rd most similar word to desk. Does it exists? what is the most efficient way to do this? 回答1: You can use the rank(entity1, entity2) to get the distance - same as the index. model.wv.rank(sample

Gensim LDA for text classification

99封情书 提交于 2021-02-10 09:54:10
问题 I post my question here because there are already some answers on how to use scikit methods with gensim like scikit vectorizers with gensim or this but I haven't seen the whole pipeline to be used for text classification. I will try to explain a little bit my situation I want to use gensim LDA implemented methods in order to proceed further to text classification. I have one dataset which is consisted from three parts(train(25K), test(25K) and unlabeled data(50K)). What I am trying to do is

Export gensim doc2vec embeddings into separate file to use with keras Embedding layer later

|▌冷眼眸甩不掉的悲伤 提交于 2021-02-10 07:08:07
问题 I am a bit new to gensim and right now I am trying to solve the problem which involves using the doc2vec embeddings in keras. I wasn't able to find existing implementation of doc2vec in keras - as far as I see in all examples I found so far everyone just uses the gensim to get the document embeddings. Once I trained my doc2vec model in gensim I need to export embeddings weights from genim into keras somehow and it is not really clear on how to do that. I see that model.syn0 Supposedly gives

Export gensim doc2vec embeddings into separate file to use with keras Embedding layer later

隐身守侯 提交于 2021-02-10 07:05:49
问题 I am a bit new to gensim and right now I am trying to solve the problem which involves using the doc2vec embeddings in keras. I wasn't able to find existing implementation of doc2vec in keras - as far as I see in all examples I found so far everyone just uses the gensim to get the document embeddings. Once I trained my doc2vec model in gensim I need to export embeddings weights from genim into keras somehow and it is not really clear on how to do that. I see that model.syn0 Supposedly gives