linguistics

Is there a fairly simple way for a script to tell (from context) whether “her” is a possessive pronoun?

谁都会走 提交于 2019-11-30 14:04:41
问题 I am writing a script to reverse all genders in a piece of text, so all gendered words are swapped - "man" is swapped with "woman", "she" is swapped with "he", etc. But there is an ambiguity as to whether "her" should be replaced with "him" or "his". 回答1: Okay. Lets look at this like a linguist might. I am thinking aloud here. "Her" is a pronoun. It can either be a: 1. possessive pronoun This is her book. 2. personal pronoun Give it to her . (after preposition) He wrote her a letter.

Probability tree for sentences in nltk employing both lookahead and lookback dependencies

你。 提交于 2019-11-30 11:47:40
Does nltk or any other NLP tool allow to construct probability trees based on input sentences thus storing the language model of the input text in a dictionary tree, the following example gives the rough idea, but I need the same functionality such that a word Wt does not just probabilistically modelled on past input words(history) Wt-n but also on lookahead words like Wt+m. Also the lookback and lookahead word count should also be 2 or more i.e. bigrams or more. Are there any other libraries in python which achieve this? from collections import defaultdict import nltk import math ngram =

Is there a fairly simple way for a script to tell (from context) whether “her” is a possessive pronoun?

℡╲_俬逩灬. 提交于 2019-11-30 09:36:28
I am writing a script to reverse all genders in a piece of text, so all gendered words are swapped - "man" is swapped with "woman", "she" is swapped with "he", etc. But there is an ambiguity as to whether "her" should be replaced with "him" or "his". Okay. Lets look at this like a linguist might. I am thinking aloud here. " Her " is a pronoun. It can either be a: 1. possessive pronoun This is her book. 2. personal pronoun Give it to her . (after preposition) He wrote her a letter. (indirect object) He treated her for a cold. (direct object) So lets look at case (1), possessive pronoun. That is

Implementing Read typeclass where parsing strings includes “$”

空扰寡人 提交于 2019-11-30 05:42:29
问题 I've been playing with Haskell for about a month. For my first "real" Haskell project I'm writing a parts-of-speech tagger. As part of this project I have a type called Tag that represents a parts-of-speech tag, implemented as follows: data Tag = CC | CD | DT | EX | FW | IN | JJ | JJR | JJS ... The above is a long list of standardized parts-of-speech tags which I've intentionally truncated. However, in this standard set of tags there are two that end in a dollar sign ($): PRP$ and NNP$.

Generating the plural form of a noun

為{幸葍}努か 提交于 2019-11-30 03:59:21
问题 Given a word, which may or may not be a singular-form noun, how would you generate its plural form? Based on this NLTK tutorial and this informal list on pluralization rules, I wrote this simple function: def plural(word): """ Converts a word to its plural form. """ if word in c.PLURALE_TANTUMS: # defective nouns, fish, deer, etc return word elif word in c.IRREGULAR_NOUNS: # foot->feet, person->people, etc return c.IRREGULAR_NOUNS[word] elif word.endswith('fe'): # wolf -> wolves return word[:

Extracting “((Adj|Noun)+|((Adj|Noun)(Noun-Prep)?)(Adj|Noun))Noun” from Text (Justeson & Katz, 1995)

蓝咒 提交于 2019-11-29 08:47:43
I would like to query if it is possible to extract ((Adj|Noun)+|((Adj|Noun)(Noun-Prep)?)(Adj|Noun))Noun proposed by Justeson and Katz (1995) in R package openNLP? That is, I would like to use this linguistic filtering to extract candidate noun phrases. I cannot well understand its meaning. Could you do me a favor to explain it or transform such representation into R language. Many thanks. Maybe we can start the sample code from: library("openNLP") acq <- "This paper describes a novel optical thread plug gauge (OTPG) for internal thread inspection using machine vision. The OTPG is composed of a

How can I programmatically generate Heroku-like subdomain names?

不想你离开。 提交于 2019-11-28 15:45:49
问题 We've all seen the interesting subdomains that you get automatically assigned when you deploy an app to Heroku with a bare "heroku create". Some examples: blazing-mist-4652, electric-night-4641, morning-frost-5543, radiant-river-7322, and so on. It seems they all follow a adjective-noun-4digitnumber pattern (for the most part). Did they simply type out a dictionary of some adjectives and nouns, then choose combinations from them at random when you push an app? Is there a Ruby gem that

How do I determine if a random string sounds like English?

馋奶兔 提交于 2019-11-28 08:11:56
I have an algorithm that generates strings based on a list of input words. How do I separate only the strings that sounds like English words? ie. discard RDLO while keeping LORD . EDIT: To clarify, they do not need to be actual words in the dictionary. They just need to sound like English. For example KEAL would be accepted. You can build a markov-chain of a huge english text. Afterwards you can feed words into the markov chain and check how high the probability is that the word is english. See here: http://en.wikipedia.org/wiki/Markov_chain At the bottom of the page you can see the markov

Extracting “((Adj|Noun)+|((Adj|Noun)(Noun-Prep)?)(Adj|Noun))Noun” from Text (Justeson & Katz, 1995)

两盒软妹~` 提交于 2019-11-28 01:53:57
问题 I would like to query if it is possible to extract ((Adj|Noun)+|((Adj|Noun)(Noun-Prep)?)(Adj|Noun))Noun proposed by Justeson and Katz (1995) in R package openNLP? That is, I would like to use this linguistic filtering to extract candidate noun phrases. I cannot well understand its meaning. Could you do me a favor to explain it or transform such representation into R language. Many thanks. Maybe we can start the sample code from: library("openNLP") acq <- "This paper describes a novel optical

LSA - Latent Semantic Analysis - How to code it in PHP?

泄露秘密 提交于 2019-11-27 19:11:31
I would like to implement Latent Semantic Analysis (LSA) in PHP in order to find out topics/tags for texts. Here is what I think I have to do. Is this correct? How can I code it in PHP? How do I determine which words to chose? I don't want to use any external libraries. I've already an implementation for the Singular Value Decomposition (SVD) . Extract all words from the given text. Weight the words/phrases, e.g. with tf–idf . If weighting is too complex, just take the number of occurrences. Build up a matrix: The columns are some documents from the database (the more the better?), the rows