fasttext

FastText recall is 'nan' but precision is a number

旧时模样 提交于 2021-02-10 05:31:20
问题 I trained a supervised model in FastText using the Python interface and I'm getting weird results for precision and recall. First, I trained a model: model = fasttext.train_supervised("train.txt", wordNgrams=3, epoch=100, pretrainedVectors=pretrained_model) Then I get results for the test data: def print_results(N, p, r): print("N\t" + str(N)) print("P@{}\t{:.3f}".format(1, p)) print("R@{}\t{:.3f}".format(1, r)) print_results(*model.test('test.txt')) But the results are always odd, because

Install fasttext on Windows 10 with anaconda

纵饮孤独 提交于 2021-02-07 17:14:26
问题 I am trying to install fasttext in anaconda with Windows 10 using the command: pip install fasttext as explained here: https://pypi.org/project/fasttext/ The error messages are: ValueError: Unknown MS Compiler version 1900 and Command "c:\users\nicol\anaconda3\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\nicol\\AppData\\Local\\Temp\\pip-install-pd0xqmlg\\fasttext\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec

FastText - Cannot load model.bin due to C++ extension failed to allocate the memory

随声附和 提交于 2021-02-07 05:58:07
问题 I'm trying to use the FastText Python API https://pypi.python.org/pypi/fasttext Although, from what I've read, this API can't load the newer .bin model files at https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md as suggested in https://github.com/salestock/fastText.py/issues/115 I've tried everything that is suggested at that issue, and furthermore https://github.com/Kyubyong/wordvectors doesn't have the .bin for English, otherwise the problem would be solved. Does

Difference between Fasttext .vec and .bin file

孤人 提交于 2021-02-06 09:45:11
问题 I recently downloaded fasttext pretrained model for english. I got two files: wiki.en.vec wiki.en.bin I am not sure what is the difference between the two files? 回答1: The .vec files contain only the aggregated word vectors, in plain-text. The .bin files in addition contain the model parameters, and crucially, the vectors for all the n-grams. So if you want to encode words you did not train with using those n-grams (FastText's famous "subword information"), you need to find an API that can

Fasttext how to load a .csv column into model.predict

拜拜、爱过 提交于 2021-02-04 12:46:05
问题 I am new to python and NLP. I have followed this tutorial (https://fasttext.cc/docs/en/supervised-tutorial.html) to train my fasttxt supervised model in Python. I have a csv with Text column and I would like to predict labels to ever row from the file. My question is how can I load (transform) the csv column in the predict input and save the label. model.predict("Which baking dish is best to bake a banana bread ?", k=-1, threshold=0.5) instead of this ( the text in ""Which baking....") I

fastText -autotune-validation not working

旧城冷巷雨未停 提交于 2021-01-29 12:40:36
问题 I am trying to use the autotune-validation command to check the f1 score of a dataset I am working in. I tested fastText in two different machines (Ubuntu and MAC), but I got the following error: Unknown argument: -autotune-validation The following arguments are mandatory: -input training file path -output output file path The following arguments are optional: -verbose verbosity level [2] ... I tried to read the documentation, but according to the fastText website, this command should work

fasttext models detecting norwegian text as danish [closed]

给你一囗甜甜゛ 提交于 2021-01-29 06:50:08
问题 Closed . This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 3 months ago . Improve this question I am using fasttext (v=0.9.1) to detect the language of a text (see this). Norwegian text is being detected as Danish when using this model. !curl "https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin" > lid.bin import fastText language_detector=fastText.load

fastText(三):微博短文本下fastText的应用(二)

房东的猫 提交于 2021-01-09 12:00:40
上一篇讲到,fastText在训练数据中过拟合的问题。接下来将介绍一些提高fastText泛化能力的尝试。 模型泛化 使用过fastText的人,往往会被它的很多特性征服,例如训练速度、兼具word embedding和分类等。但是正如一个硬币有正反两面,fastText也并非完美,fastText的泛化性是它的短板。 增加正则项 在Logistic Regression中,调节正则项能够提高模型的泛化性能。通过上一篇博客可知,fastText的代价函数是: L(d,h)=−∑i=1CyilogPi=−∑i=1CyilogeθTih∑Cj=1eθTjh L(d,h)=−∑i=1Cyilog⁡Pi=−∑i=1Cyilog⁡eθiTh∑j=1CeθjTh 增加正则项后,代价函数: L(d,h)=−∑i=1CyilogPi+λ∑i=1V∥wi∥+μ∑j=1C∥θj∥ L(d,h)=−∑i=1Cyilog⁡Pi+λ∑i=1V‖wi‖+μ∑j=1C‖θj‖ 此时词向量的更新方式变为: wj=wj−η∑i=1C(Pi−yi)θi−λwj, j=1,2,...,L wj=wj−η∑i=1C(Pi−yi)θi−λwj, j=1,2,...,L 增加了正则项后,同一个句子的词向量无法按照相同的方向更新,词向量间的相似便无法保证。此时,fastText与常见的前馈神经网络(DNN)没有任何差别

FastText 介绍

强颜欢笑 提交于 2021-01-09 12:00:09
FastText 介绍 在面试百度的NLP工程师时,被问及常用的词向量表示学习方法有哪些,我说知道word2vec,然后大佬又问我知道FastText么... 这就很尴尬了,不会! 不同于word2vec, fasttext利用的是词的形态学信息,也就是词的内部构造信息,也就是子词信息。话说,利用fasttext是不是可以拿汉语的偏旁部首来训练字向量?不过n-gram是需要字符序列信息的,汉字的笔画顺序?emmmmmm.........不过利用字向量得到词向量确实很方便。 那什么是子词信息?fasttext采用的character n-gram来做的,比如 where 这个词,那么它的character 3-gram 子词包含如下 <wh, whe, her, ere, re>以及本身<where> 这对尖括号的妙处在于,可以方便的讲her这个单词与where的子词her进行区分,her的character 3-gram子词包含的是<her> 不包含 her,于是这两个便可以区分开来。 那么为什么要利用子词信息呢?脸书的研究者们认为,像word2vec这类词分布表示模型,词与词之间的信息没有更好的共享,也就是参数没有得到有效的共享,分解为粒度更小的子词后,通过共享子词表示,来达到信息共享的目的。 具体的做法做法 给定一个character n-gram 字典,假设大小为G

安装fasttext报错:command &apos;x86_64-linux-gnu-gcc

旧巷老猫 提交于 2021-01-09 11:13:20
听说win安装fasttext麻烦,linux上安装方便,结果我win10直接pip install 安装上去了,ubantu上pip install fasttext反而报错:error: command 'x86_64-linux-gnu-gcc' failed with exit status 1 还是依赖问题:直接复制下面代码安装缺少的依赖,然后再pip install fasttext即可安装 apt-get install python3 python-dev python3-dev build-essential libssl-dev libffi-dev libxml2-dev libxslt1-dev zlib1g-dev python-pip 来源: oschina 链接: https://my.oschina.net/u/3726752/blog/1843950