nlp

Learnig NER using category list

狂风中的少年 提交于 2019-12-25 05:19:17
问题 In the template for training CRF++, how can I include a custom dictionary.txt file for listed companies, another for popular European foods, for eg, or just about any category. Then provide a sample training data for each category whereby it learns how those specific named entites are used within a context for that category. In this way, I as well as the system, can be sure it correctly understood how certain named entites are structured in a text, whether a tweet or a Pulitzer prize winning

NER CRF, Exception in thread “main” java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory [duplicate]

蹲街弑〆低调 提交于 2019-12-25 04:44:05
问题 This question already has answers here : Why am I getting a NoClassDefFoundError in Java? (23 answers) Closed 3 years ago . I have downloaded the latest version for NER from this link. Then after extracting it, I have run this command. java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop austen.prop This is not working and getting following exception. CRFClassifier invoked on Mon Jul 25 06:56:22 EDT 2016 with arguments: -prop austen.prop Exception in thread "main" java.lang

Failed to execute goal

老子叫甜甜 提交于 2019-12-25 04:29:11
问题 I'm new to maven . I tired to mvn clean worked sucessfully but after mvn package I got the follwoing error [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.3:compile (default-compile) on project L: Compilation failure: Compilation failure: [ERROR] /home/user/L/src/main/java/edu/stanford/nlp/pipeline/NLP.java:[4,34] package edu.stanford.nlp.neural.rnn does not exist [ERROR] [ERROR] /home/user/L/src/main/java/edu/stanford/nlp/pipeline/NLP.java:[6,32] cannot find

How to translate syntatic parse to a dependency parse tree?

╄→гoц情女王★ 提交于 2019-12-25 04:26:51
问题 Using Link Grammar I can have the syntaxic parse of sentences something like the following: +-------------------Xp------------------+ +------->WV------->+------Ost------+ | +-----Wd----+ | +----Ds**x---+ | | +Ds**c+--Ss--+ +-PHc+---A---+ | | | | | | | | | LEFT-WALL a koala.n is.v a cute.a animal.n . +---------------------Xp--------------------+ +------->WV------>+---------Osm--------+ | +-----Wd----+ | +------Ds**x------+ | | +Ds**c+--Ss-+ +--PHc-+-----A----+ | | | | | | | | | LEFT-WALL a

How can I get words after and before a specific token?

空扰寡人 提交于 2019-12-25 03:53:38
问题 I currently work on a project which is simply creating basic corpus databases and tokenizes texts. But it seems I am stuck in a matter. Assume that we have those things: import os, re texts = [] for i in os.listdir(somedir): # Somedir contains text files which contain very large plain texts. with open(i, 'r') as f: texts.append(f.read()) Now I want to find the word before and after a token. myToken = 'blue' found = [] for i in texts: fnd = re.findall('[a-zA-Z0-9]+ %s [a-zA-Z0-9]+|\. %s [a-zA

NLTK PortStemmer missing positional argument

风流意气都作罢 提交于 2019-12-25 02:44:59
问题 I have been experimenting with nltk, and I do not understand what my mistake is.` I tried this: from nltk.stem import PorterStemmer stemmer = PorterStemmer examples = ["cars", "eating", "quickly"] for w in examples: print(stemmer.stem(w)) And Python returns this: TypeError: stem() missing 1 required positional argument: 'word' Could anyone explain to me what I am doing wrong? Thanks in advance! 回答1: Add () to PorterStemmer since it is a class instantiation and it should work: from nltk.stem

Convert numbers to English strings

半世苍凉 提交于 2019-12-25 02:29:06
问题 Websites like http://www.easysurf.cc/cnvert18.htm and http://www.calculatorsoup.com/calculators/conversions/numberstowords.php tries to convert a numerical string into an english strings, but they are giving natural sounding output. For example, on http://www.easysurf.cc/cnvert18.htm: [in]: 100456 [out]: one hundred thousand four hundred fifty-six this website is a little better, http://www.calculator.org/calculate-online/mathematics/text-number.aspx: [in]: 100456 [out]: one hundred thousand,

Loading and editing a cfg file for grammar parsing

非 Y 不嫁゛ 提交于 2019-12-25 01:23:13
问题 I am following the steps mentioned here - http://www.nltk.org/book/ch10.html to load and parse data using a cfg file. When I use the code below I don't face any issue. cp = load_parser('grammars/book_grammars/sql0.fcfg') query = 'What cities are located in China' trees = list(cp.parse(query.split())) answer = trees[0].label()['SEM'] answer = [s for s in answer if s] q = ' '.join(answer) print(q) What I wish to do is take out the sql0.fcfg, make changes to it and load it into the parser again

Training Stanford-NER-CRF, control number of iterations and regularisation (L1,L2) parameters

China☆狼群 提交于 2019-12-24 23:15:53
问题 I was looking through StanfordNER documentation/FAQ but I can't find anything related to specifying the maximum number of iterations in training and also the value of the regularisation parameters L1 and L2. I saw an answer on which is suggested to set, for instance: maxIterations=10 in the properties file, but that did not gave any results. Is it possible to set these parameters? 回答1: I had to dig in the code but found it, so basically StanfordNER supports many different numerical

How do I extract contents from a koRpus object in R?

时光总嘲笑我的痴心妄想 提交于 2019-12-24 21:13:40
问题 I'm using the tm package, and looking to get the Flesch-Kincaid scores for a document using R. I found the koRpus package has some a lot of metrics including reading-level, and started using that. However, the object returned seems to be a very complicated s4 object I don't understand how to parse. So, I apply this to my corpus: txt <- system.file("texts", "txt", package = "tm") (d <- Corpus(DirSource(txt, encoding = "UTF-8"), readerControl = list(language = "lat"))) f <- function(x) tokenize