pos-tagger

nltk StanfordNERTagger : How to get proper nouns without capitalization

为君一笑 提交于 2019-11-30 10:04:23
I am trying to use the StanfordNERTagger and nltk to extract keywords from a piece of text. docText="John Donk works for POI. Brian Jones wants to meet with Xyz Corp. for measuring POI's Short Term performance Metrics." words = re.split("\W+",docText) stops = set(stopwords.words("english")) #remove stop words from the list words = [w for w in words if w not in stops and len(w) > 2] str = " ".join(words) print str stn = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz') stp = StanfordPOSTagger('english-bidirectional-distsim.tagger') stanfordPosTagList=[word for word,pos in stp.tag(str

How to use OpenNLP with Java?

大城市里の小女人 提交于 2019-11-29 20:29:59
I want to POStag an English sentence and do some processing. I would like to use openNLP. I have it installed When I execute the command I:\Workshop\Programming\nlp\opennlp-tools-1.5.0-bin\opennlp-tools-1.5.0>java -jar opennlp-tools-1.5.0.jar POSTagger models\en-pos-maxent.bin < Text.txt It gives output POSTagging the input in Text.txt Loading POS Tagger model ... done (4.009s) My_PRP$ name_NN is_VBZ Shabab_NNP i_FW am_VBP 22_CD years_NNS old._. Average: 66.7 sent/s Total: 1 sent Runtime: 0.015s I hope it installed properly? Now how do i do this POStagging from inside a java application? I

How to output NLTK pos_tag in the string instead of a list?

為{幸葍}努か 提交于 2019-11-29 17:28:42
I need to run nltk.pos_tag on a large dataset and need to have its output like the one that is offered by Stanford tagger. For example while running the following code I have; import nltk text=nltk.word_tokenize("We are going out.Just you and me.") print nltk.pos_tag(text) the output is: [('We', 'PRP'), ('are', 'VBP'), ('going', 'VBG'), ('out.Just', 'IN'), ('you', 'PRP'), ('and', 'CC'), ('me', 'PRP'), ('.', '.')] In the case that I need it to be like: We/PRP are/VBP going/VBG out.Just/NN you/PRP and/CC me/PRP ./. I prefer to not using string functions and need a dirrect output because the

match POS tag and sequence of words

不想你离开。 提交于 2019-11-29 17:28:21
I have the following two strings with their POS tags: Sent1 : " something like how writer pro or phraseology works would be really cool. " [('something', 'NN'), ('like', 'IN'), ('how', 'WRB'), ('writer', 'NN'), ('pro', 'NN'), ('or', 'CC'), ('phraseology', 'NN'), ('works', 'NNS'), ('would', 'MD'), ('be', 'VB'), ('really', 'RB'), ('cool', 'JJ'), ('.', '.')] Sent2 : " more options like the syntax editor would be nice " [('more', 'JJR'), ('options', 'NNS'), ('like', 'IN'), ('the', 'DT'), ('syntax', 'NN'), ('editor', 'NN'), ('would', 'MD'), ('be', 'VB'), ('nice', 'JJ')] I am looking for a way to

NLTK POS tagger not working

对着背影说爱祢 提交于 2019-11-29 10:49:17
If I try this : import nltk text = nltk.word_tokenize("And now for something completely different") nltk.pos_tag(text) Output: Traceback (most recent call last): File "C:/Python27/pos.py", line 3, in <module> nltk.pos_tag(text) File "C:\Python27\lib\site-packages\nltk-2.0.4-py2.7.egg\nltk\tag\__init__.py" ipos_tag tagger = load(_POS_TAGGER) File "C:\Python27\lib\site-packages\nltk-2.0.4-py2.7.egg\nltk\data.py", line 605,in resource_val = pickle.load(_open(resource_url)) ImportError: No module named numpy.core.multiarray It seems that the saved word tokenizer requires numpy . You'll need to

Error using Stanford POS Tagger in NLTK Python

老子叫甜甜 提交于 2019-11-29 06:17:09
I am trying to use Stanford POS Tagger in NLTK but I am not able to run the example code given here http://www.nltk.org/api/nltk.tag.html#module-nltk.tag.stanford import nltk from nltk.tag.stanford import POSTagger st = POSTagger(r'english-bidirectional-distim.tagger',r'D:/stanford-postagger/stanford-postagger.jar') st.tag('What is the airspeed of an unladen swallow?'.split()) I have already added environment variables as CLASSPATH = D:/stanford-postagger/stanford-postagger.jar STANFORD_MODELS = D:/stanford-postagger/models/ Here is the error I keep getting Traceback (most recent call last):

How to use OpenNLP with Java?

我的梦境 提交于 2019-11-28 17:09:54
问题 I want to POStag an English sentence and do some processing. I would like to use openNLP. I have it installed When I execute the command I:\Workshop\Programming\nlp\opennlp-tools-1.5.0-bin\opennlp-tools-1.5.0>java -jar opennlp-tools-1.5.0.jar POSTagger models\en-pos-maxent.bin < Text.txt It gives output POSTagging the input in Text.txt Loading POS Tagger model ... done (4.009s) My_PRP$ name_NN is_VBZ Shabab_NNP i_FW am_VBP 22_CD years_NNS old._. Average: 66.7 sent/s Total: 1 sent Runtime: 0

How to output NLTK pos_tag in the string instead of a list?

梦想与她 提交于 2019-11-28 12:06:10
问题 I need to run nltk.pos_tag on a large dataset and need to have its output like the one that is offered by Stanford tagger. For example while running the following code I have; import nltk text=nltk.word_tokenize("We are going out.Just you and me.") print nltk.pos_tag(text) the output is: [('We', 'PRP'), ('are', 'VBP'), ('going', 'VBG'), ('out.Just', 'IN'), ('you', 'PRP'), ('and', 'CC'), ('me', 'PRP'), ('.', '.')] In the case that I need it to be like: We/PRP are/VBP going/VBG out.Just/NN you

How to apply pos_tag_sents() to pandas dataframe efficiently

梦想与她 提交于 2019-11-28 11:04:38
In situations where you wish to POS tag a column of text stored in a pandas dataframe with 1 sentence per row the majority of implementations on SO use the apply method dfData['POSTags']= dfData['SourceText'].apply( lamda row: [pos_tag(word_tokenize(row) for item in row]) The NLTK documentation recommends using the pos_tag_sents() for efficient tagging of more than one sentence. Does that apply to this example and if so would the code be as simple as changing pso_tag to pos_tag_sents or does NLTK mean text sources of paragraphs As mentioned in the comments pos_tag_sents() aims to reduce the

Extracting noun+noun or (adj|noun)+noun from Text

不羁岁月 提交于 2019-11-28 07:03:39
I would like to query if it is possible to extract noun+noun or (adj|noun)+noun in R package openNLP?That is, I would like to use linguistic filtering to extract candidate noun phrases. Could you direct me how to do? Many thanks. Thanks for the responses. here is the code: library("openNLP") acq <- "Gulf Applied Technologies Inc said it sold its subsidiaries engaged in pipeline and terminal operations for 12.2 mln dlrs. The company said the sale is subject to certain post closing adjustments, which it did not explain. Reuter." acqTag <- tagPOS(acq) acqTagSplit = strsplit(acqTag," ")