nlp

Does anyone know how to configure the hunpos wrapper class on nltk?

删除回忆录丶 提交于 2019-12-20 01:08:29
问题 i've tried the following code and installed from http://code.google.com/p/hunpos/downloads/list english-wsj-1.0 hunpos-1.0-linux.tgz i've extracted the file onto '~/' directory and when i tried the following python code: import nltk from nltk.tag import hunpos from nltk.tag.hunpos import HunposTagger import os, sys, re, glob cwd = os.getcwd() for infile in glob.glob(os.path.join(cwd, '*.txt')): (PATH, FILENAME) = os.path.split(infile) read = open(infile) ht = HunposTagger('english.model') ht

Identify prepositons and individual POS

匆匆过客 提交于 2019-12-19 19:52:10
问题 I am trying to find correct parts of speech for each word in paragraph. I am using Stanford POS Tagger. However, I am stuck at a point. I want to identify prepositions from the paragraph. Penn Treebank Tagset says that: IN Preposition or subordinating conjunction how, can I be sure if current word is be preposition or subordinating conjunction . How can I extract only prepositions from paragraph in this case? 回答1: You can't be sure. The reason for this somewhat strange PoS is that it's really

Looking for a database or text file of english words with their different forms

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-19 19:46:57
问题 I am working on a project and I need to get the root of a given word (stemming). As you know, the stemming algorithms that don't use a dictionary are not accurate. Also I tried the WordNet but it is not good for my project. I found phpmorphy project but it doesn't include API in Java. At this time I am looking for a database or a text file of english words with their different forms. for example: run running ran ... include including included ... ... Thank you for your help or advise. 回答1:

PDFminer: PDFTextExtractionNotAllowed Error

左心房为你撑大大i 提交于 2019-12-19 18:24:09
问题 I'm trying to extract text from pdfs I've scraped off the internet, but when I attempt to download them I get the error: File "/usr/local/lib/python2.7/dist-packages/pdfminer/pdfpage.py", line 124, in get_pages raise PDFTextExtractionNotAllowed('Text extraction is not allowed: %r' % fp) PDFTextExtractionNotAllowed: Text extraction is not allowed <cStringIO.StringO object at 0x7f79137a1ab0> I've checked stackoverflow and someone else who had this error found their pdfs to be secured with a

Semantic Relatedness Algorithms - python [closed]

邮差的信 提交于 2019-12-19 13:45:11
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 5 years ago . I want to find relatedness between two synsets and I came across many algorithms like resnik,lin,wu palmer,path algorithm,leacock chodorow etc. Can somebody tell me which one is most efficient among these algorithms? 回答1: From a "show me an example" perspective, here's an example

Return a list of matches by given phrase

北战南征 提交于 2019-12-19 11:29:34
问题 I'm trying to make a method which can check whether a given phrase matches at least one item from list of phrases and returns them. Input is the phrase, a list of phrases and a dictionary of lists of synonyms. The point is to make it universal. Here is the example: phrase = 'This is a little house' dictSyns = {'little':['small','tiny','little'], 'house':['cottage','house']} listPhrases = ['This is a tiny house','This is a small cottage','This is a small building','I need advice'] I can create

Return a list of matches by given phrase

喜欢而已 提交于 2019-12-19 11:29:20
问题 I'm trying to make a method which can check whether a given phrase matches at least one item from list of phrases and returns them. Input is the phrase, a list of phrases and a dictionary of lists of synonyms. The point is to make it universal. Here is the example: phrase = 'This is a little house' dictSyns = {'little':['small','tiny','little'], 'house':['cottage','house']} listPhrases = ['This is a tiny house','This is a small cottage','This is a small building','I need advice'] I can create

How to parse languages other than English with Stanford Parser? in java, not command lines

时间秒杀一切 提交于 2019-12-19 11:19:38
问题 I have been trying to use Stanford Parser in my Java program to parse some sentences in Chinese. Since I am quite new at both Java and Stanford Parser, I used the 'ParseDemo.java' to practice. The code works fine with sentences in English and outputs the right result. However, when I changed the model to 'chinesePCFG.ser.gz' and tried to parse some segmented Chinese sentences, things went wrong. Here's my code in Java class ParserDemo { public static void main(String[] args) {

Jape file to find the pattern within a sentence

人走茶凉 提交于 2019-12-19 09:26:32
问题 I need to annotate a part of a sentence if the words i have written in my jape rule appear in the same sentence. Eg the sentence is "The child cannot resist any changes to his routine". I have put words like resist in "trouble.lst" file and changes in "alteration.lst" file. Now in this sentence i need to annotate the part "resist any changes" as "A3b". I have tried using the below code but it is not considering words in the same sentence. My jape rule is taking words from different sentences

Get the word under the mouse cursor in Windows

风格不统一 提交于 2019-12-19 09:08:03
问题 Greetings everyone, A friend and I are discussing the possibility of a new project: A translation program that will pop up a translation whenever you hover over any word in any control, even static, non-editable ones. I know there are many browser plugins to do this sort of thing on webpages; we're thinking about how we would do it system-wide (on Windows). Of course, the key difficulty is figuring out the word the user is hovering over. I'm aware of MSAA and Automation, but as far as I can