nlp

Compose synthetic English phrase that would contain 160 bits of recoverable information

主宰稳场 提交于 2019-12-18 12:28:14
问题 I have 160 bits of random data. Just for fun, I want to generate pseudo-English phrase to "store" this information in. I want to be able to recover this information from the phrase. Note: This is not a security question, I don't care if someone else will be able to recover the information or even detect that it is there or not. Criteria for better phrases, from most important to the least: Short Unique Natural-looking The current approach, suggested here: Take three lists of 1024 nouns, verbs

Count verbs, nouns, and other parts of speech with python's NLTK

江枫思渺然 提交于 2019-12-18 12:14:44
问题 I have multiple texts and I would like to create profiles of them based on their usage of various parts of speech, like nouns and verbs. Basially, I need to count how many times each part of speech is used. I have tagged the text but am not sure how to go further: tokens = nltk.word_tokenize(text.lower()) text = nltk.Text(tokens) tags = nltk.pos_tag(text) How can I save the counts for each part of speech into a variable? 回答1: The pos_tag method gives you back a list of (token, tag) pairs:

Stemming - code examples or open source projects?

馋奶兔 提交于 2019-12-18 11:36:10
问题 Stemming is something that's needed in tagging systems. I use delicious, and I don't have time to manage and prune my tags. I'm a bit more careful with my blog, but it isn't perfect. I write software for embedded systems that would be much more functional (helpful to the user) if they included stemming. For instance: Parse Parser Parsing Should all mean the same thing to whatever system I'm putting them into. Ideally there's a BSD licensed stemmer somewhere, but if not, where do I look to

Ruby, Count syllables

…衆ロ難τιáo~ 提交于 2019-12-18 11:35:25
问题 I am using ruby to calculate the Gunning Fog Index of some content that I have, I can successfully implement the algorithm described here: Gunning Fog Index I am using the below method to count the number of syllables in each word: Tokenizer = /([aeiouy]{1,3})/ def count_syllables(word) len = 0 if word[-3..-1] == 'ing' then len += 1 word = word[0...-3] end got = word.scan(Tokenizer) len += got.size() if got.size() > 1 and got[-1] == ['e'] and word[-1].chr() == 'e' and word[-2].chr() != 'l'

Using word2vec to classify words in categories

别来无恙 提交于 2019-12-18 11:31:31
问题 BACKGROUND I have vectors with some sample data and each vector has a category name (Places,Colors,Names). ['john','jay','dan','nathan','bob'] -> 'Names' ['yellow', 'red','green'] -> 'Colors' ['tokyo','bejing','washington','mumbai'] -> 'Places' My objective is to train a model that take a new input string and predict which category it belongs to. For example if a new input is "purple" then I should be able to predict 'Colors' as the correct category. If the new input is "Calgary" it should

Natural Language Parsing tools: what is out there and what is not? [closed]

十年热恋 提交于 2019-12-18 11:13:22
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed last year . I'm looking for various NLP tools for a project I'm working on and right now I've found most useful the Stanford NLP projects. Does anyone know if there are other tools that are out there that would be useful for a language understander? And more importantly, are there tools that are NOT out there? Most

Word frequencies from strings in Postgres?

纵然是瞬间 提交于 2019-12-18 11:11:59
问题 Is it possible to identify distinct words and a count for each, from fields containing text strings in Postgres? 回答1: Something like this? SELECT some_pk, regexp_split_to_table(some_column, '\s') as word FROM some_table Getting the distinct words is easy then: SELECT DISTINCT word FROM ( SELECT regexp_split_to_table(some_column, '\s') as word FROM some_table ) t or getting the count for each word: SELECT word, count(*) FROM ( SELECT regexp_split_to_table(some_column, '\s') as word FROM some

How I train an Named Entity Recognizer identifier in OpenNLP?

*爱你&永不变心* 提交于 2019-12-18 10:54:08
问题 Ok, I have the following code to train the NER Identifier from OpenNLP FileReader fileReader = new FileReader("train.txt"); ObjectStream fileStream = new PlainTextByLineStream(fileReader); ObjectStream sampleStream = new NameSampleDataStream(fileStream); TokenNameFinderModel model = NameFinderME.train("pt-br", "train", sampleStream, Collections.<String, Object>emptyMap()); nfm = new NameFinderME(model); I don't know if I'm doing something wrong of if something is missing, but the classifying

How to turn plural words singular?

半世苍凉 提交于 2019-12-18 10:53:25
问题 I'm preparing some table names for an ORM, and I want to turn plural table names into single entity names. My only problem is finding an algorithm that does it reliably. Here's what I'm doing right now: If a word ends with -ies , I replace the ending with -y If a word ends with -es , I remove this ending. This doesn't always work however - for example, it replaces Types with Typ Otherwise, I just remove the trailing -s Does anyone know of a better algorithm? 回答1: Those are all general rules

Determine if a sentence is an inquiry

橙三吉。 提交于 2019-12-18 10:37:10
问题 How can I detect if a search query is in the form of a question? For example, a customer might search for "how do I track my order" (notice no question mark). I'm guessing most direct questions would conform to a particular grammar. Very simple guessing approach: START WORDS = [who, what, when, where, why, how, is, can, does, do] isQuestion(sentence): sentence ends with '?' OR sentence starts with one of START WORDS START WORDS list could be longer. The scope is a website search box, so I