stanford-nlp

Is there a multilingual temporal expression tagger that can run on Hadoop?

大憨熊 提交于 2019-12-08 11:53:59
问题 I need to extract dates from lots of text. The more languages the better; English,Spanish, and Portuguese at a minimum. Does such a tool exist? In Java and Mavenized? Here's what I've found: http://code.google.com/p/heideltime/ many languages and an impressive online demo, but requires some odd external dependencies that I suspect will make cluster deployment hard/impossible http://nlp.stanford.edu/software/sutime.shtml Well documented, but English only. Easy to train? http://natty

Spark 2.0.1 write Error: Caused by: java.util.NoSuchElementException

跟風遠走 提交于 2019-12-08 11:42:36
问题 I am trying to attach sentiment value to each message and I have downloaded all stanford core jar files as dependencies: import sqlContext.implicits._ import com.databricks.spark.corenlp.functions._ import org.apache.spark.sql.functions._ val version = "3.6.0" val model = s"stanford-corenlp-$version-models-english" // val jars = sc.listJars if (!jars.exists(jar => jar.contains(model))) { import scala.sys.process._ s"wget http://repo1.maven.org/maven2/edu/stanford/nlp/stanford- corenlp/

How to parse sentence that is multilingual?

被刻印的时光 ゝ 提交于 2019-12-08 08:36:41
问题 When I use Stanford Parser to parse sentences like: "Jirí Hubac 's script is a gem ." "Absorbing character study by André Turpin ." It raise internal error. How to deal with such situation that the sentence is multilingual? 回答1: Using the full Stanford CoreNLP toolkit available here: http://stanfordnlp.github.io/CoreNLP/ I ran this command: java -Xmx6g -cp "stanford-corenlp-full-2015-12-09/*" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse -file

How to find if a word in a sentence is pointing to a city

笑着哭i 提交于 2019-12-08 08:15:31
问题 How to find if a word in a sentence is pointing to a city I live in San Francisco I work in San Jose I was born in New York Is there a way to find that "San Francisco" is a city in the above sentence. 回答1: The task of recognising possibly multi-word expressions that reference individuals of various specific types (locations, but also organisations, dates, etc.) is called named-entity recognition (NER). For a simple task such as yours, existing freely available tools and models are sufficient.

Dependencies are null with the German Parser from Stanford CoreNLP

人盡茶涼 提交于 2019-12-08 07:20:50
问题 I tried to parse german sentences with the Stanford CoreNLP and the german models Version 3.6. On the website it says that Dependency Parsing is supported for german but when I parse a sentence the dependencies are always null. I use the scala script within deepdive to run the NLP with the following properties: val germanProps = new Properties() germanProps.put("annotators", "tokenize, ssplit, pos, ner, parse") germanProps.put("tokenize.language", "de") germanProps.put("pos.model", "edu

Is it possible to get a set of a specific named entity tokens that comprise a phrase

时光总嘲笑我的痴心妄想 提交于 2019-12-08 06:48:03
问题 I'm using the Stanford CoreNLP parsers to run through some text and there are date phrases, such as 'the second Monday in October' and 'the past year'. The library will appropriately tag each token as a DATE named entity, but is there a way to programmatically get this whole date phrase? And it's not just dates, ORGANIZATION named entities will do the same ("The International Olympic Committee", for example, could be one identified in a given text example). String content = "Thanksgiving, or

Spanish POS tagging with Stanford NLP - is it possible to get the person/number/gender?

无人久伴 提交于 2019-12-08 06:20:13
问题 I'm using Stanford NLP to do POS tagging for Spanish texts. I can get a POS Tag for each word but I notice that I am only given the first four sections of the Ancora tag and it's missing the last three sections for person, number and gender. Why does Stanford NLP only use a reduced version of the Ancora tag? Is it possible to get the entire tag using Stanford NLP? Here is my code (please excuse the jruby...): props = java.util.Properties.new() props.put("tokenize.language", "es") props.put(

Unable to use Stanford NER in python module

旧时模样 提交于 2019-12-08 05:26:22
问题 I want to use Python Stanford NER module but keep getting an error,I searched it on internet but got nothing. Here is the basic usage with error. import ner tagger = ner.HttpNER(host='localhost', port=8080) tagger.get_entities("University of California is located in California, United States") Error Traceback (most recent call last): File "<pyshell#3>", line 1, in <module> tagger.get_entities("University of California is located in California, United States") File "C:\Python27\lib\site

Extract Noun phrase using stanford NLP

一个人想着一个人 提交于 2019-12-08 04:50:59
问题 I am trying to find the Theme/Noun phrase from a sentence using Stanford NLP For eg: the sentence "the white tiger" I would love to get Theme/Nound phrase as : white tiger. For this I used pos tagger. My sample code is below. Result I am getting is "tiger" which is not correct. Sample code I used to run is public static void main(String[] args) throws IOException { Properties props = new Properties(); props.setProperty("annotators", "tokenize,ssplit,parse"); StanfordCoreNLP pipeline = new

Remove standard english language stop words in Stanford Topic Modeling Toolbox

这一生的挚爱 提交于 2019-12-08 03:54:45
问题 I am using Stanford Topic Modeling Toolbox 0.4.0 for LDA , I noticed that if I want to remove standard english language stop words, I can use a StopWordFilter("en") as the last step the tokenizer, but how do I use it? import scalanlp.io._; import scalanlp.stage._; import scalanlp.stage.text._; import scalanlp.text.tokenize._; import scalanlp.pipes.Pipes.global._; import edu.stanford.nlp.tmt.stage._; import edu.stanford.nlp.tmt.model.lda._; import edu.stanford.nlp.tmt.model.llda._; val source