
FastText recall is 'nan' but precision is a number

旧时模样 提交于 2021-02-10 05:31:20
问题 I trained a supervised model in FastText using the Python interface and I'm getting weird results for precision and recall. First, I trained a model: model = fasttext.train_supervised("train.txt", wordNgrams=3, epoch=100, pretrainedVectors=pretrained_model) Then I get results for the test data: def print_results(N, p, r): print("N\t" + str(N)) print("P@{}\t{:.3f}".format(1, p)) print("R@{}\t{:.3f}".format(1, r)) print_results(*model.test('test.txt')) But the results are always odd, because

Install fasttext on Windows 10 with anaconda

纵饮孤独 提交于 2021-02-07 17:14:26
问题 I am trying to install fasttext in anaconda with Windows 10 using the command: pip install fasttext as explained here: The error messages are: ValueError: Unknown MS Compiler version 1900 and Command "c:\users\nicol\anaconda3\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\nicol\\AppData\\Local\\Temp\\pip-install-pd0xqmlg\\fasttext\\';f=getattr(tokenize, 'open', open)(__file__);'\r\n', '\n');f.close();exec

FastText - Cannot load model.bin due to C++ extension failed to allocate the memory

随声附和 提交于 2021-02-07 05:58:07
问题 I'm trying to use the FastText Python API Although, from what I've read, this API can't load the newer .bin model files at as suggested in I've tried everything that is suggested at that issue, and furthermore doesn't have the .bin for English, otherwise the problem would be solved. Does

Difference between Fasttext .vec and .bin file

孤人 提交于 2021-02-06 09:45:11
问题 I recently downloaded fasttext pretrained model for english. I got two files: wiki.en.vec wiki.en.bin I am not sure what is the difference between the two files? 回答1: The .vec files contain only the aggregated word vectors, in plain-text. The .bin files in addition contain the model parameters, and crucially, the vectors for all the n-grams. So if you want to encode words you did not train with using those n-grams (FastText's famous "subword information"), you need to find an API that can

Fasttext how to load a .csv column into model.predict

拜拜、爱过 提交于 2021-02-04 12:46:05
问题 I am new to python and NLP. I have followed this tutorial ( to train my fasttxt supervised model in Python. I have a csv with Text column and I would like to predict labels to ever row from the file. My question is how can I load (transform) the csv column in the predict input and save the label. model.predict("Which baking dish is best to bake a banana bread ?", k=-1, threshold=0.5) instead of this ( the text in ""Which baking....") I

fastText -autotune-validation not working

旧城冷巷雨未停 提交于 2021-01-29 12:40:36
问题 I am trying to use the autotune-validation command to check the f1 score of a dataset I am working in. I tested fastText in two different machines (Ubuntu and MAC), but I got the following error: Unknown argument: -autotune-validation The following arguments are mandatory: -input training file path -output output file path The following arguments are optional: -verbose verbosity level [2] ... I tried to read the documentation, but according to the fastText website, this command should work

fasttext models detecting norwegian text as danish [closed]

给你一囗甜甜゛ 提交于 2021-01-29 06:50:08
问题 Closed . This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 3 months ago . Improve this question I am using fasttext (v=0.9.1) to detect the language of a text (see this). Norwegian text is being detected as Danish when using this model. !curl "" > lid.bin import fastText language_detector=fastText.load


房东的猫 提交于 2021-01-09 12:00:40
上一篇讲到,fastText在训练数据中过拟合的问题。接下来将介绍一些提高fastText泛化能力的尝试。 模型泛化 使用过fastText的人,往往会被它的很多特性征服,例如训练速度、兼具word embedding和分类等。但是正如一个硬币有正反两面,fastText也并非完美,fastText的泛化性是它的短板。 增加正则项 在Logistic Regression中,调节正则项能够提高模型的泛化性能。通过上一篇博客可知,fastText的代价函数是: L(d,h)=−∑i=1CyilogPi=−∑i=1CyilogeθTih∑Cj=1eθTjh L(d,h)=−∑i=1Cyilog⁡Pi=−∑i=1Cyilog⁡eθiTh∑j=1CeθjTh 增加正则项后,代价函数: L(d,h)=−∑i=1CyilogPi+λ∑i=1V∥wi∥+μ∑j=1C∥θj∥ L(d,h)=−∑i=1Cyilog⁡Pi+λ∑i=1V‖wi‖+μ∑j=1C‖θj‖ 此时词向量的更新方式变为: wj=wj−η∑i=1C(Pi−yi)θi−λwj, j=1,2,...,L wj=wj−η∑i=1C(Pi−yi)θi−λwj, j=1,2,...,L 增加了正则项后,同一个句子的词向量无法按照相同的方向更新,词向量间的相似便无法保证。此时,fastText与常见的前馈神经网络(DNN)没有任何差别

FastText 介绍

强颜欢笑 提交于 2021-01-09 12:00:09
FastText 介绍 在面试百度的NLP工程师时,被问及常用的词向量表示学习方法有哪些,我说知道word2vec,然后大佬又问我知道FastText么... 这就很尴尬了,不会! 不同于word2vec, fasttext利用的是词的形态学信息,也就是词的内部构造信息,也就是子词信息。话说,利用fasttext是不是可以拿汉语的偏旁部首来训练字向量?不过n-gram是需要字符序列信息的,汉字的笔画顺序?emmmmmm.........不过利用字向量得到词向量确实很方便。 那什么是子词信息?fasttext采用的character n-gram来做的,比如 where 这个词,那么它的character 3-gram 子词包含如下 <wh, whe, her, ere, re>以及本身<where> 这对尖括号的妙处在于,可以方便的讲her这个单词与where的子词her进行区分,her的character 3-gram子词包含的是<her> 不包含 her,于是这两个便可以区分开来。 那么为什么要利用子词信息呢?脸书的研究者们认为,像word2vec这类词分布表示模型,词与词之间的信息没有更好的共享,也就是参数没有得到有效的共享,分解为粒度更小的子词后,通过共享子词表示,来达到信息共享的目的。 具体的做法做法 给定一个character n-gram 字典,假设大小为G

安装fasttext报错:command &apos;x86_64-linux-gnu-gcc

旧巷老猫 提交于 2021-01-09 11:13:20
听说win安装fasttext麻烦,linux上安装方便,结果我win10直接pip install 安装上去了,ubantu上pip install fasttext反而报错:error: command 'x86_64-linux-gnu-gcc' failed with exit status 1 还是依赖问题:直接复制下面代码安装缺少的依赖,然后再pip install fasttext即可安装 apt-get install python3 python-dev python3-dev build-essential libssl-dev libffi-dev libxml2-dev libxslt1-dev zlib1g-dev python-pip 来源: oschina 链接: