part-of-speech

Querying part-of-speech tags with Lucene 7 OpenNLP

故事扮演 提交于 2021-02-20 03:50:40
问题 For fun and learning I am trying to build a part-of-speech (POS) tagger with OpenNLP and Lucene 7.4. The goal would be that once indexed I can actually search for a sequence of POS tags and find all sentences that match sequence. I already get the indexing part, but I am stuck on the query part. I am aware that SolR might have some functionality for this, and I already checked the code (which was not so self-expalantory after all). But my goal is to understand and implement in Lucene 7, not

Querying part-of-speech tags with Lucene 7 OpenNLP

点点圈 提交于 2021-02-20 03:49:49
问题 For fun and learning I am trying to build a part-of-speech (POS) tagger with OpenNLP and Lucene 7.4. The goal would be that once indexed I can actually search for a sequence of POS tags and find all sentences that match sequence. I already get the indexing part, but I am stuck on the query part. I am aware that SolR might have some functionality for this, and I already checked the code (which was not so self-expalantory after all). But my goal is to understand and implement in Lucene 7, not

Querying part-of-speech tags with Lucene 7 OpenNLP

半腔热情 提交于 2021-02-20 03:49:00
问题 For fun and learning I am trying to build a part-of-speech (POS) tagger with OpenNLP and Lucene 7.4. The goal would be that once indexed I can actually search for a sequence of POS tags and find all sentences that match sequence. I already get the indexing part, but I am stuck on the query part. I am aware that SolR might have some functionality for this, and I already checked the code (which was not so self-expalantory after all). But my goal is to understand and implement in Lucene 7, not

WordNet - What does n and the number represent?

半世苍凉 提交于 2020-12-29 13:14:21
问题 My question is related to WordNet Interface. >>> wn.synsets('cat') [Synset('cat.n.01'), Synset('guy.n.01'), Synset('cat.n.03'), Synset('kat.n.01'), Synset('cat-o'-nine-tails.n.01'), Synset('caterpillar.n.02'), Synset('big_cat.n.01'), Synset('computerized_tomography.n.01'), Synset('cat.v.01'), Synset('vomit.v.01')] >>> I could not find the answer to what is the purpose of n and the following number in cat.n.01 or caterpillar.n.02 . 回答1: Per the NLTK docs, a <lemma>.<pos>.<number> Synset string

WordNet - What does n and the number represent?

送分小仙女□ 提交于 2020-12-29 13:13:32
问题 My question is related to WordNet Interface. >>> wn.synsets('cat') [Synset('cat.n.01'), Synset('guy.n.01'), Synset('cat.n.03'), Synset('kat.n.01'), Synset('cat-o'-nine-tails.n.01'), Synset('caterpillar.n.02'), Synset('big_cat.n.01'), Synset('computerized_tomography.n.01'), Synset('cat.v.01'), Synset('vomit.v.01')] >>> I could not find the answer to what is the purpose of n and the following number in cat.n.01 or caterpillar.n.02 . 回答1: Per the NLTK docs, a <lemma>.<pos>.<number> Synset string

WordNet - What does n and the number represent?

给你一囗甜甜゛ 提交于 2020-12-29 13:09:34
问题 My question is related to WordNet Interface. >>> wn.synsets('cat') [Synset('cat.n.01'), Synset('guy.n.01'), Synset('cat.n.03'), Synset('kat.n.01'), Synset('cat-o'-nine-tails.n.01'), Synset('caterpillar.n.02'), Synset('big_cat.n.01'), Synset('computerized_tomography.n.01'), Synset('cat.v.01'), Synset('vomit.v.01')] >>> I could not find the answer to what is the purpose of n and the following number in cat.n.01 or caterpillar.n.02 . 回答1: Per the NLTK docs, a <lemma>.<pos>.<number> Synset string

Count verbs, nouns, and other parts of speech with python's NLTK

江枫思渺然 提交于 2019-12-18 12:14:44
问题 I have multiple texts and I would like to create profiles of them based on their usage of various parts of speech, like nouns and verbs. Basially, I need to count how many times each part of speech is used. I have tagged the text but am not sure how to go further: tokens = nltk.word_tokenize(text.lower()) text = nltk.Text(tokens) tags = nltk.pos_tag(text) How can I save the counts for each part of speech into a variable? 回答1: The pos_tag method gives you back a list of (token, tag) pairs:

Painfully slow Postgres query using WHERE on many adjacent rows

亡梦爱人 提交于 2019-12-12 18:37:48
问题 I have the following psql table. It has roughly 2 billion rows in total. id word lemma pos textid source 1 Stuffing stuff vvg 190568 AN 2 her her appge 190568 AN 3 key key nn1 190568 AN 4 into into ii 190568 AN 5 the the at 190568 AN 6 lock lock nn1 190568 AN 7 she she appge 190568 AN 8 pushed push vvd 190568 AN 9 her her appge 190568 AN 10 way way nn1 190568 AN 11 into into ii 190568 AN 12 the the appge 190568 AN 13 house house nn1 190568 AN 14 . . 190568 AN 15 She she appge 190568 AN 16 had

Forcing POS tags in Stanford CoreNLP

故事扮演 提交于 2019-12-12 02:08:47
问题 Is there a way to process an already POS-tagged text using Stanford CoreNLP? For example, I have the sentence in this format They_PRP are_VBP hunting_VBG dogs_NNS ._. and I'd like to annotate with lemma, ner, parse, etc. by forcing the given POS annotation. Update. I tried this code, but it's not working. Properties props = new Properties(); props.setProperty("annotators", "tokenize, ssplit, pos, lemma"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); String sentText = "They_PRP are

How to pass part-of-speech in WordNetLemmatizer?

自闭症网瘾萝莉.ら 提交于 2019-12-11 16:26:52
问题 I am preprocessing text data. However, I am facing issue with lemmatizing. Below is the sample text: 'An 18-year-old boy was referred to prosecutors Thursday for allegedly stealing about ¥15 million ($134,300) worth of cryptocurrency last year by hacking a digital currency storage website, police said.', 'The case is the first in Japan in which criminal charges have been pursued against a hacker over cryptocurrency losses, the police said.', '\n', 'The boy, from the city of Utsunomiya,