nlp | 易学教程

Get synonyms from synset returns error - Python

阅读更多关于 Get synonyms from synset returns error - Python

问题 I'm trying to get synonyms of a given word using Wordnet. The problem is that despite I'm doing the same as is written here: here, it returns error. Here is my code: from nltk.corpus import wordnet as wn import nltk dog = wn.synset('dog.n.01') print dog.lemma_names >>> <bound method Synset.lemma_names of Synset('dog.n.01')> for i,j in enumerate(wn.synsets('small')): print "Synonyms:", ", ".join(j.lemma_names) >>> Synonyms: Traceback (most recent call last): File "C:/Users/Python

extracting relations from text

阅读更多关于 extracting relations from text

问题 I want to extract relations from unstructured text in the form of (SUBJECT,OBJECT,ACTION) relations, for instance, "The boy is sitting on the table eating the chicken" would give me, (boy,chicken,eat) (boy,table,LOCATION) etc.. although a python program + NLTK could process such a simple sentence as above. I'd like to know if any of you have used tools or libraries preferably opensource to extract relations from a much wider domain such as a large collection of text documents or the web. 回答1:

python nltk keyword extraction from sentence

阅读更多关于 python nltk keyword extraction from sentence

问题 "First thing we do, let's kill all the lawyers." - William Shakespeare Given the quote above, I would like to pull out "kill" and "lawyers" as the two prominent keywords to describe the overall meaning of the sentence. I have extracted the following noun/verb POS tags: [["First", "NNP"], ["thing", "NN"], ["do", "VBP"], ["lets", "NNS"], ["kill", "VB"], ["lawyers", "NNS"]] The more general problem I am trying to solve is to distill a sentence to the "most important"* words/tags to summarise the

Convert chinese characters to hanyu pinyin

阅读更多关于 Convert chinese characters to hanyu pinyin

问题 How to convert from chinese characters to hanyu pinyin? E.g. 你 --> Nǐ 马 --> Mǎ More Info: Either accents or numerical forms of hanyu pinyin are acceptable, the numerical form being my preference. A Java library is preferred, however, a library in another language that can be put in a wrapper is also OK. I would like anyone who has personally used such a library before to recommend or comment on it, in terms of its quality/ reliabilitty. 回答1: The problem of converting hanzi to pinyin is a

NLTK Context Free Grammar Genaration

阅读更多关于 NLTK Context Free Grammar Genaration

问题 I'm working on a non-English parser with Unicode characters. For that, I decided to use NLTK. But it requires a predefined context-free grammar as below: S -> NP VP VP -> V NP | V NP PP PP -> P NP V -> "saw" | "ate" | "walked" NP -> "John" | "Mary" | "Bob" | Det N | Det N PP Det -> "a" | "an" | "the" | "my" N -> "man" | "dog" | "cat" | "telescope" | "park" P -> "in" | "on" | "by" | "with" In my app, I am supposed to minimize hard coding with the use of a rule-based grammar. For example, I can

How to perform FST (Finite State Transducer) composition

阅读更多关于 How to perform FST (Finite State Transducer) composition

问题 Consider the following FSTs : T1 0 1 a : b 0 2 b : b 2 3 b : b 0 0 a : a 1 3 b : a T2 0 1 b : a 1 2 b : a 1 1 a : d 1 2 a : c How do I perform the composition operation on these two FSTs (i.e. T1 o T2) I saw some algorithms but couldn't understand much. If anyone could explain it in a easy way it would be a major help. Please note that this is NOT a homework. The example is taken from the lecture slides where the solution is given but I couldn't figure out how to get to it. 回答1: Since you

How to determine the language(English, Chinese…) of a given string in Oracle?

阅读更多关于 How to determine the language(English, Chinese…) of a given string in Oracle?

问题 How to determine the language (English, Chinese...) of a given sting (table column value) in Oracle(multi language environment)? 回答1: It should be possible to use a library like Language Dectection for Java and tie it with your PL/SQL. It will probably be more efficient to use SQL to do naive Bayesian filtering and use language profiles derived e.g. from Wikipedia (they are neatly packed here). These are just pointers, not a full solution as requested for the bounty, but should help bounty

Realtime tracking of top 100 twitter words per min/hour/day

阅读更多关于 Realtime tracking of top 100 twitter words per min/hour/day

问题 I recently came across this interview question: Given a continuous twitter feed, design an algorithm to return the 100 most frequent words used at this minute, this hour and this day. I was thinking of a system with a hash map of word -> count linked to 3 min-heaps for the current min, hour and day. Every incoming message is tokenized, sanitized and the word counts updated in the hash map (and increase-key in the heaps if the word already exists in it) If any of the words don't exist in the

How do you find the subject of a sentence? [closed]

阅读更多关于 How do you find the subject of a sentence? [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 5 years ago . I am new to NLP and was doing research about what language toolkit I should be using to do the following. I would like to do one of the two things which accomplishes the same thing: I basically would like to classify a text, usually one sentence that contains 15 words. Would like to classify if the sentence is

Computer AI algorithm to write sentences?

阅读更多关于 Computer AI algorithm to write sentences?

问题 I am searching for information on algorithms to process text sentences or to follow a structure when creating sentences that are valid in a normal human language such as English. I would like to know if there are projects working in this field that I can go learn from or start using. For example, if I gave a program a noun, provided it with a thesaurus (for related words) and part-of-speech (so it understood where each word belonged in a sentence) - could it create a random, valid sentence? I