part-of-speech | 易学教程

Querying part-of-speech tags with Lucene 7 OpenNLP

阅读更多关于 Querying part-of-speech tags with Lucene 7 OpenNLP

问题 For fun and learning I am trying to build a part-of-speech (POS) tagger with OpenNLP and Lucene 7.4. The goal would be that once indexed I can actually search for a sequence of POS tags and find all sentences that match sequence. I already get the indexing part, but I am stuck on the query part. I am aware that SolR might have some functionality for this, and I already checked the code (which was not so self-expalantory after all). But my goal is to understand and implement in Lucene 7, not

Querying part-of-speech tags with Lucene 7 OpenNLP

阅读更多关于 Querying part-of-speech tags with Lucene 7 OpenNLP

Querying part-of-speech tags with Lucene 7 OpenNLP

阅读更多关于 Querying part-of-speech tags with Lucene 7 OpenNLP

WordNet - What does n and the number represent?

阅读更多关于 WordNet - What does n and the number represent?

问题 My question is related to WordNet Interface. >>> wn.synsets('cat') [Synset('cat.n.01'), Synset('guy.n.01'), Synset('cat.n.03'), Synset('kat.n.01'), Synset('cat-o'-nine-tails.n.01'), Synset('caterpillar.n.02'), Synset('big_cat.n.01'), Synset('computerized_tomography.n.01'), Synset('cat.v.01'), Synset('vomit.v.01')] >>> I could not find the answer to what is the purpose of n and the following number in cat.n.01 or caterpillar.n.02 . 回答1: Per the NLTK docs, a <lemma>.<pos>.<number> Synset string

WordNet - What does n and the number represent?

阅读更多关于 WordNet - What does n and the number represent?

WordNet - What does n and the number represent?

阅读更多关于 WordNet - What does n and the number represent?

Count verbs, nouns, and other parts of speech with python's NLTK

阅读更多关于 Count verbs, nouns, and other parts of speech with python's NLTK

问题 I have multiple texts and I would like to create profiles of them based on their usage of various parts of speech, like nouns and verbs. Basially, I need to count how many times each part of speech is used. I have tagged the text but am not sure how to go further: tokens = nltk.word_tokenize(text.lower()) text = nltk.Text(tokens) tags = nltk.pos_tag(text) How can I save the counts for each part of speech into a variable? 回答1: The pos_tag method gives you back a list of (token, tag) pairs:

Painfully slow Postgres query using WHERE on many adjacent rows

阅读更多关于 Painfully slow Postgres query using WHERE on many adjacent rows

问题 I have the following psql table. It has roughly 2 billion rows in total. id word lemma pos textid source 1 Stuffing stuff vvg 190568 AN 2 her her appge 190568 AN 3 key key nn1 190568 AN 4 into into ii 190568 AN 5 the the at 190568 AN 6 lock lock nn1 190568 AN 7 she she appge 190568 AN 8 pushed push vvd 190568 AN 9 her her appge 190568 AN 10 way way nn1 190568 AN 11 into into ii 190568 AN 12 the the appge 190568 AN 13 house house nn1 190568 AN 14 . . 190568 AN 15 She she appge 190568 AN 16 had

Forcing POS tags in Stanford CoreNLP

阅读更多关于 Forcing POS tags in Stanford CoreNLP

问题 Is there a way to process an already POS-tagged text using Stanford CoreNLP? For example, I have the sentence in this format They_PRP are_VBP hunting_VBG dogs_NNS ._. and I'd like to annotate with lemma, ner, parse, etc. by forcing the given POS annotation. Update. I tried this code, but it's not working. Properties props = new Properties(); props.setProperty("annotators", "tokenize, ssplit, pos, lemma"); StanfordCoreNLP pipeline = new StanfordCoreNLP(props); String sentText = "They_PRP are

How to pass part-of-speech in WordNetLemmatizer?

阅读更多关于 How to pass part-of-speech in WordNetLemmatizer?

问题 I am preprocessing text data. However, I am facing issue with lemmatizing. Below is the sample text: 'An 18-year-old boy was referred to prosecutors Thursday for allegedly stealing about ¥15 million ($134,300) worth of cryptocurrency last year by hacking a digital currency storage website, police said.', 'The case is the first in Japan in which criminal charges have been pursued against a hacker over cryptocurrency losses, the police said.', '\n', 'The boy, from the city of Utsunomiya,