nlp | 易学教程

After training word embedding with gensim's fasttext's wrapper, how to embed new sentences?

阅读更多关于 After training word embedding with gensim's fasttext's wrapper, how to embed new sentences?

问题 After reading the tutorial at gensim's docs, I do not understand what is the correct way of generating new embeddings from a trained model. So far I have trained gensim's fast text embeddings like this: from gensim.models.fasttext import FastText as FT_gensim model_gensim = FT_gensim(size=100) # build the vocabulary model_gensim.build_vocab(corpus_file=corpus_file) # train the model model_gensim.train( corpus_file=corpus_file, epochs=model_gensim.epochs, total_examples=model_gensim.corpus

is there a method to select all categories MeSH with sparql

阅读更多关于 is there a method to select all categories MeSH with sparql

问题 i want to get data with sparql from Medical Subject Headings RDF i try to do this code : PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX meshv: <http://id.nlm.nih.gov/mesh/vocab#> PREFIX mesh: <http://id.nlm.nih.gov/mesh/> PREFIX mesh2015: <http://id.nlm.nih.gov/mesh/2015/> PREFIX mesh2016: <http://id.nlm.nih.gov/mesh/2016/> PREFIX mesh2017: <http://id.nlm.nih.gov/mesh/2017/> SELECT DISTINCT ?descriptor ?label FROM <http://id.nlm.nih.gov

is there a method to select all categories MeSH with sparql

阅读更多关于 is there a method to select all categories MeSH with sparql

how can i modify language model before applying patterns

阅读更多关于 how can i modify language model before applying patterns

问题 I have this code : from spacy.matcher import Matcher,PhraseMatcher import spacy from spacy.matcher import Matcher nlp = spacy.load("en_core_web_sm") matcher = Matcher(nlp.vocab,validate=True) patterns = [ [{'POS': 'QUALIF'}, {'POS': 'CCONJ'}, {'POS': 'ADJ'}, {'POS': 'NOUN'}], ] matcher.add("process_1", None, *patterns) texts= ["it is a beautiful and big apple"] for text in texts: doc = nlp(text) matches = matcher(doc) for _, start, end in matches: print(doc[start:end].text) So, I want to

how can i modify language model before applying patterns

阅读更多关于 how can i modify language model before applying patterns

What is the reliable way to convert text data (document) to numerical data (vector) and save it for further use?

阅读更多关于 What is the reliable way to convert text data (document) to numerical data (vector) and save it for further use?

问题 As we know machines can't understand the text but it understands numbers so in NLP we convert text to some numeric representation and one of them is BOW representation. Here, my objective is to convert every document to some numeric representation and save it for future use. And I am following the below way to do that by converting text to BOW and saving it in a pickle file. My question is, whether we can do this in a better and reliable way? so that every document can be saved as some vector

Is it possible to find uncertainties of spaCy POS tags?

阅读更多关于 Is it possible to find uncertainties of spaCy POS tags?

问题 I am trying to build a non-English spell checker that relies on classification of sentences by spaCy, which allows my algorithm to then use the POS tags and the grammatical dependencies of the individual tokens to determine incorrect spelling (in my case more specifically: incorrect splits in Dutch compound words). However, spaCy appears to classify sentences incorrectly if they contain grammatical errors, for example classifying a noun as a verb, even though the classified word doesn't even

Problems with gensim WikiCorpus - aliasing chunkize to chunkize_serial; (__mp_main instead of main__?)

阅读更多关于 Problems with gensim WikiCorpus - aliasing chunkize to chunkize_serial; (__mp_main__ instead of __main__?)

问题 I'm quite new to Python and coding in general, so I seem to have run into an issue. I'm trying to run this code (credit to Matthew Mayo, whole thing can be found here): # import warnings # warnings.filterwarnings(action = 'ignore', category = UserWarning, module = 'gensim') import sys from gensim.corpora import WikiCorpus def make_corpus (in_f, out_f): print(0) output = open(out_f, 'w', encoding = 'utf-8') print(1) wiki = WikiCorpus(in_f) print(2) i = 0 for text in wiki.get_texts(): output

Sliding window for long text in BERT for Question Answering

阅读更多关于 Sliding window for long text in BERT for Question Answering

问题 I've read post which explains how the sliding window works but I cannot find any information on how it is actually implemented. From what I understand if the input are too long, sliding window can be used to process the text. Please correct me if I am wrong. Say I have a text "In June 2017 Kaggle announced that it passed 1 million registered users" . Given some stride and max_len , the input can be split into chunks with over lapping words (not considering padding). In June 2017 Kaggle

Sliding window for long text in BERT for Question Answering

阅读更多关于 Sliding window for long text in BERT for Question Answering