nlp

Python frameworks for NLP? [closed]

倖福魔咒の 提交于 2019-12-24 17:00:41
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 3 years ago . I am working on a project wherein I have to extract the following information from a set of articles (the articles could be on anything): People Find the names of any people present, like "Barack Obama" Topic or related tags of the article, like "Parliament" , "World Energy" Company/Organisation I should be able

How to distinguish between added sentences and altered sentences with difflib and nltk?

有些话、适合烂在心里 提交于 2019-12-24 15:32:46
问题 Downloading this page and making a very minor edit to it, changing the first 65 in this paragraph to 68 : I then run it through the following code to pull out the diffs. import bs4 from bs4 import BeautifulSoup import urllib2 import lxml.html as lh url = 'https://secure.ssa.gov/apps10/reference.nsf/links/02092016062645AM' response = urllib2.urlopen(url) content = response.read() # get response as list of lines root = lh.fromstring(content) section1 = root.xpath("//div[@class = 'column-12']")

Stanford NNDep parser: java.lang.ArrayIndexOutOfBoundsException

强颜欢笑 提交于 2019-12-24 14:11:04
问题 After training a model, i’m trying to parse the test treebank. Unfortunately, this error keeps popping up: Loading depparse model file: nndep.model.txt.gz ... ################### #Transitions: 77 #Labels: 38 ROOTLABEL: root Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 25 at edu.stanford.nlp.parser.nndep.Classifier.preCompute(Classifier.java:663) at edu.stanford.nlp.parser.nndep.Classifier.preCompute(Classifier.java:637) at edu.stanford.nlp.parser.nndep.DependencyParser

Binary Feature Extraction

孤者浪人 提交于 2019-12-24 13:52:40
问题 I am a beginner in feature extraction for natural language processing purposes. I want to know how I can use a hashmap to extract features for a text. If each feature is a "key" in hashmap and its value is the "value" (all the features are binary, 0 or 1), does it mean that I need to have n hashmap (n is the number of words in the text)? Because for each word I need to extract the features. Am I right? Thanks in advance, Alice 回答1: Yes you can implement this with a hash map however depending

Parsing either font style or block of paragraph in GATE

半腔热情 提交于 2019-12-24 12:47:13
问题 I have a word document. I need to match particular table section or heading section of it using GATE. I thought if there were any steps from where we can first check any font size or font style of the heading and then match rest of the content till next heading pattern repeats. 回答1: GATE has only a limited support for MS Word documents provided by the Apache Tika and Apache POI libraries. I do not know about any free alternative... We have developed our own plugin ( gate.DocumentFormat ) for

word_tokenize TypeError: expected string or buffer [closed]

孤街醉人 提交于 2019-12-24 12:42:35
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 4 years ago . When calling word_tokenize I get the following error: File "C:\Python34\lib\site-packages\nltk\tokenize\punkt.py", line 1322, in _slices_from_text for match in self._lang_vars.period_context_re().finditer(text): TypeError: expected string or buffer I have a large text file (1500.txt) from which I want to remove

Drawing a flatten NLTK Parse Tree with NP chunks

本秂侑毒 提交于 2019-12-24 11:46:47
问题 I want to analyze sentences with NLTK and display their chunks as a tree. NLTK offers the method tree.draw() to draw a tree. This following code draws a tree for the sentence "the little yellow dog barked at the cat" : import nltk sentence = [("the", "DT"), ("little", "JJ"), ("yellow", "JJ"), ("dog", "NN"), ("barked","VBD"), ("at", "IN"), ("the", "DT"), ("cat", "NN")] pattern = "NP: {<DT>?<JJ>*<NN>}" NPChunker = nltk.RegexpParser(pattern) result = NPChunker.parse(sentence) result.draw() The

Find different realization of a word in a sentence string - Python

廉价感情. 提交于 2019-12-24 11:23:33
问题 (This question is with regards to string checking in general and not Natural Language Procesisng per se, but if you view it as an NLP problem, imagine it's not a langauge that current analyzers can analye, for simplicity sake, i'll use english strings as e.g.) lets say there are only 6 possible form that a word can be realized in the initial letter being capitalized its plural form with an "s" its plural form with an "es" capitalized + "es" capitalized + "s" the basic form without plural or

Why do I get this import error when I have the required DLLs?

人盡茶涼 提交于 2019-12-24 10:18:29
问题 from sklearn.feature_extraction.text import CountVectorizer getting this error from sklearn.feature_extraction.text import CountVectorizer File "C:\Users\Anaconda3\lib\site-packages\sklearn\__init__.py", line 57, in <module> from .base import clone File "C:\Users\Anaconda3\lib\site-packages\sklearn\base.py", line 12, in <module> from .utils.fixes import signature File "C:\Users\Anaconda3\lib\site-packages\sklearn\utils\__init__.py", line 11, in <module> from .validation import (as_float_array

Context free grammar with feature structure in Python

為{幸葍}努か 提交于 2019-12-24 10:10:31
问题 Am trying to generate sentences from a defined grammar with python, to avoid agreement problem I used feature structures, This is the code I have done so far: >>> from __future__ import print_function >>> import nltk >>> from nltk.featstruct import FeatStruct >>> from nltk import grammar, parse >>> from nltk.parse.generate import generate >>> from nltk import CFG >>> g = """ % start DP DP-> D[AGR=[NUM='sg', PERS=3, GND='m']] N[AGR=[NUM='sg', GND='m']] D[AGR=[NUM='sg', PERS=3, GND='f']] ->