stanford-nlp | 易学教程

Is there a multilingual temporal expression tagger that can run on Hadoop?

阅读更多关于 Is there a multilingual temporal expression tagger that can run on Hadoop?

问题 I need to extract dates from lots of text. The more languages the better; English,Spanish, and Portuguese at a minimum. Does such a tool exist? In Java and Mavenized? Here's what I've found: http://code.google.com/p/heideltime/ many languages and an impressive online demo, but requires some odd external dependencies that I suspect will make cluster deployment hard/impossible http://nlp.stanford.edu/software/sutime.shtml Well documented, but English only. Easy to train? http://natty

Spark 2.0.1 write Error: Caused by: java.util.NoSuchElementException

阅读更多关于 Spark 2.0.1 write Error: Caused by: java.util.NoSuchElementException

问题 I am trying to attach sentiment value to each message and I have downloaded all stanford core jar files as dependencies: import sqlContext.implicits._ import com.databricks.spark.corenlp.functions._ import org.apache.spark.sql.functions._ val version = "3.6.0" val model = s"stanford-corenlp-$version-models-english" // val jars = sc.listJars if (!jars.exists(jar => jar.contains(model))) { import scala.sys.process._ s"wget http://repo1.maven.org/maven2/edu/stanford/nlp/stanford- corenlp/

How to parse sentence that is multilingual?

阅读更多关于 How to parse sentence that is multilingual?

问题 When I use Stanford Parser to parse sentences like: "Jirí Hubac 's script is a gem ." "Absorbing character study by André Turpin ." It raise internal error. How to deal with such situation that the sentence is multilingual? 回答1: Using the full Stanford CoreNLP toolkit available here: http://stanfordnlp.github.io/CoreNLP/ I ran this command: java -Xmx6g -cp "stanford-corenlp-full-2015-12-09/*" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner,parse -file

How to find if a word in a sentence is pointing to a city

阅读更多关于 How to find if a word in a sentence is pointing to a city

问题 How to find if a word in a sentence is pointing to a city I live in San Francisco I work in San Jose I was born in New York Is there a way to find that "San Francisco" is a city in the above sentence. 回答1: The task of recognising possibly multi-word expressions that reference individuals of various specific types (locations, but also organisations, dates, etc.) is called named-entity recognition (NER). For a simple task such as yours, existing freely available tools and models are sufficient.

Dependencies are null with the German Parser from Stanford CoreNLP

阅读更多关于 Dependencies are null with the German Parser from Stanford CoreNLP

问题 I tried to parse german sentences with the Stanford CoreNLP and the german models Version 3.6. On the website it says that Dependency Parsing is supported for german but when I parse a sentence the dependencies are always null. I use the scala script within deepdive to run the NLP with the following properties: val germanProps = new Properties() germanProps.put("annotators", "tokenize, ssplit, pos, ner, parse") germanProps.put("tokenize.language", "de") germanProps.put("pos.model", "edu

Is it possible to get a set of a specific named entity tokens that comprise a phrase

阅读更多关于 Is it possible to get a set of a specific named entity tokens that comprise a phrase

问题 I'm using the Stanford CoreNLP parsers to run through some text and there are date phrases, such as 'the second Monday in October' and 'the past year'. The library will appropriately tag each token as a DATE named entity, but is there a way to programmatically get this whole date phrase? And it's not just dates, ORGANIZATION named entities will do the same ("The International Olympic Committee", for example, could be one identified in a given text example). String content = "Thanksgiving, or

Spanish POS tagging with Stanford NLP - is it possible to get the person/number/gender?

阅读更多关于 Spanish POS tagging with Stanford NLP - is it possible to get the person/number/gender?

问题 I'm using Stanford NLP to do POS tagging for Spanish texts. I can get a POS Tag for each word but I notice that I am only given the first four sections of the Ancora tag and it's missing the last three sections for person, number and gender. Why does Stanford NLP only use a reduced version of the Ancora tag? Is it possible to get the entire tag using Stanford NLP? Here is my code (please excuse the jruby...): props = java.util.Properties.new() props.put("tokenize.language", "es") props.put(

Unable to use Stanford NER in python module

阅读更多关于 Unable to use Stanford NER in python module

问题 I want to use Python Stanford NER module but keep getting an error,I searched it on internet but got nothing. Here is the basic usage with error. import ner tagger = ner.HttpNER(host='localhost', port=8080) tagger.get_entities("University of California is located in California, United States") Error Traceback (most recent call last): File "<pyshell#3>", line 1, in <module> tagger.get_entities("University of California is located in California, United States") File "C:\Python27\lib\site

Extract Noun phrase using stanford NLP

阅读更多关于 Extract Noun phrase using stanford NLP

问题 I am trying to find the Theme/Noun phrase from a sentence using Stanford NLP For eg: the sentence "the white tiger" I would love to get Theme/Nound phrase as : white tiger. For this I used pos tagger. My sample code is below. Result I am getting is "tiger" which is not correct. Sample code I used to run is public static void main(String[] args) throws IOException { Properties props = new Properties(); props.setProperty("annotators", "tokenize,ssplit,parse"); StanfordCoreNLP pipeline = new

Remove standard english language stop words in Stanford Topic Modeling Toolbox

阅读更多关于 Remove standard english language stop words in Stanford Topic Modeling Toolbox

问题 I am using Stanford Topic Modeling Toolbox 0.4.0 for LDA , I noticed that if I want to remove standard english language stop words, I can use a StopWordFilter("en") as the last step the tokenizer, but how do I use it? import scalanlp.io._; import scalanlp.stage._; import scalanlp.stage.text._; import scalanlp.text.tokenize._; import scalanlp.pipes.Pipes.global._; import edu.stanford.nlp.tmt.stage._; import edu.stanford.nlp.tmt.model.lda._; import edu.stanford.nlp.tmt.model.llda._; val source