nlp

Natural language date/time parser for .NET? [closed]

允我心安 提交于 2019-12-17 09:21:20
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 2 years ago . Does anyone know of a .NET date/time parser similar to Chronic for Ruby (handles stuff like "tomorrow" or "3pm next thursday")? Note: I do write Ruby (which is how I know about Chronic) but this project must use .NET. 回答1: We developed exactly what you are looking for on an internal project. We are thinking of

NLTK and language detection

筅森魡賤 提交于 2019-12-17 07:12:46
问题 How do I detect what language a text is written in using NLTK? The examples I've seen use nltk.detect , but when I've installed it on my mac, I cannot find this package. 回答1: Have you come across the following code snippet? english_vocab = set(w.lower() for w in nltk.corpus.words.words()) text_vocab = set(w.lower() for w in text if w.lower().isalpha()) unusual = text_vocab.difference(english_vocab) from http://groups.google.com/group/nltk-users/browse_thread/thread/a5f52af2cbc4cfeb?pli=1&safe

How do I do dependency parsing in NLTK?

懵懂的女人 提交于 2019-12-17 07:08:34
问题 Going through the NLTK book, it's not clear how to generate a dependency tree from a given sentence. The relevant section of the book: sub-chapter on dependency grammar gives an example figure but it doesn't show how to parse a sentence to come up with those relationships - or maybe I'm missing something fundamental in NLP? EDIT: I want something similar to what the stanford parser does: Given a sentence "I shot an elephant in my sleep", it should return something like: nsubj(shot-2, I-1) det

How can I correctly prefix a word with “a” and “an”?

醉酒当歌 提交于 2019-12-17 07:04:35
问题 I have a .NET application where, given a noun, I want it to correctly prefix that word with "a" or "an". How would I do that? Before you think the answer is to simply check if the first letter is a vowel, consider phrases like: an honest mistake a used car 回答1: Download Wikipedia Unzip it and write a quick filter program that spits out only article text (the download is generally in XML format, along with non-article metadata too). Find all instances of a(n).... and make an index on the

Fast/Optimize N-gram implementations in python

旧城冷巷雨未停 提交于 2019-12-17 06:50:10
问题 Which ngram implementation is fastest in python? I've tried to profile nltk's vs scott's zip (http://locallyoptimal.com/blog/2013/01/20/elegant-n-gram-generation-in-python/): from nltk.util import ngrams as nltkngram import this, time def zipngram(text,n=2): return zip(*[text.split()[i:] for i in range(n)]) text = this.s start = time.time() nltkngram(text.split(), n=2) print time.time() - start start = time.time() zipngram(text, n=2) print time.time() - start [out] 0.000213146209717 6

ArrayList as key in HashMap

只愿长相守 提交于 2019-12-17 06:48:11
问题 Would it be possible to add an ArrayList as the key of HashMap . I would like to keep the frequency count of bigrams. The bigram is the key and the value is its frequency. For each of the bigrams like "he is", I create an ArrayList for it and insert it into the HashMap . But I am not getting the correct output. public HashMap<ArrayList<String>, Integer> getBigramMap(String word1, String word2) { HashMap<ArrayList<String>, Integer> hm = new HashMap<ArrayList<String>, Integer>(); ArrayList

Using NLTK and WordNet; how do I convert simple tense verb into its present, past or past participle form?

爷,独闯天下 提交于 2019-12-17 06:34:19
问题 Using NLTK and WordNet, how do I convert simple tense verb into its present, past or past participle form? For example: I want to write a function which would give me verb in expected form as follows. v = 'go' present = present_tense(v) print present # prints "going" past = past_tense(v) print past # prints "went" 回答1: I think what you're looking for is the NodeBox::Linguistics library. It does exactly that: print en.verb.present("gave") >>> give 回答2: With the help of NLTK this can also be

NLTK Named Entity recognition to a Python list

别说谁变了你拦得住时间么 提交于 2019-12-17 06:05:07
问题 I used NLTK's ne_chunk to extract named entities from a text: my_sent = "WASHINGTON -- In the wake of a string of abuses by New York police officers in the 1990s, Loretta E. Lynch, the top federal prosecutor in Brooklyn, spoke forcefully about the pain of a broken trust that African-Americans felt and said the responsibility for repairing generations of miscommunication and mistrust fell to law enforcement." nltk.ne_chunk(my_sent, binary=True) But I can't figure out how to save these entities

Fuzzy String Comparison

╄→尐↘猪︶ㄣ 提交于 2019-12-17 05:36:26
问题 What I am striving to complete is a program which reads in a file and will compare each sentence according to the original sentence. The sentence which is a perfect match to the original will receive a score of 1 and a sentence which is the total opposite will receive a 0. All other fuzzy sentences will receive a grade in between 1 and 0. I am unsure which operation to use to allow me to complete this in Python 3. I have included the sample text in which the Text 1 is the original and the

How do I do word Stemming or Lemmatization?

元气小坏坏 提交于 2019-12-17 02:52:09
问题 I've tried PorterStemmer and Snowball but both don't work on all words, missing some very common ones. My test words are: " cats running ran cactus cactuses cacti community communities ", and both get less than half right. See also: Stemming algorithm that produces real words Stemming - code examples or open source projects? 回答1: If you know Python, The Natural Language Toolkit (NLTK) has a very powerful lemmatizer that makes use of WordNet. Note that if you are using this lemmatizer for the