information-extraction | 易学教程

Ignore some part of input when parsing with ANTLR

阅读更多关于 Ignore some part of input when parsing with ANTLR

问题 I'm trying to parse a language by ANTLR (ANTLRWorks-3.5.2). The goal is to enter complete input but Antlr gives a parse tree of defined parts in grammar and ignore the rest of inputs, for example this is my grammar : grammar asap; project : '/begin PROJECT' name module+ '/end PROJECT'; module : '/begin MODULE'name '/end MODULE'; name : IDENT ; IDENT : ('a'..'z'|'A'..'Z')('a'..'z'|'A'..'Z'|'0'..'9'|'_'|'.'|':'|'-')*; Given input: /begin PROJECT HybridSailboat_2 /begin MODULE engine /begin A2ML

How to use PoS tag as a feature for training data by Naive Bayes classifier?

阅读更多关于 How to use PoS tag as a feature for training data by Naive Bayes classifier?

问题 I'm researching how to extract keyphrases from document for my thesis. In my research, I used Naive Bayes classifier machine learning for creating a training model of the candidate term features. One of features is PoS tag , I think this feature is important for specifying a term is keyphrase or not. But the input of Naive Bayes (NB) classifier is numbers and the PoS tag is a string. So I don't know the way to represent PoS tag feature as a number in order to become a input feature for NB

algorithm to extract simple sentences from complex(mixed) sentences?

阅读更多关于 algorithm to extract simple sentences from complex(mixed) sentences?

问题 Is there an algorithm that can be used to extract simple sentences from paragraphs? My ultimate goal is to later run another algorithm on the resulted simple sentence to determine the author's sentiment. I've researched this from sources such as Chae-Deug Park but none discuss preparing simple sentences as training data. Thanks in advance 回答1: I have just used openNLP for the same. public static List<String> breakIntoSentencesOpenNlp(String paragraph) throws FileNotFoundException, IOException

Ignore some part of input when parsing with ANTLR

阅读更多关于 Ignore some part of input when parsing with ANTLR

NLP to find relationship between entities

阅读更多关于 NLP to find relationship between entities

问题 My current understanding is that it's possible to extract entities from a text document using toolkits such as OpenNLP, Stanford NLP. However, is there a way to find relationships between these entities? For example consider the following text : "As some of you may know, I spent last week at CERN, the European high-energy physics laboratory where the famous Higgs boson was discovered last July. Every time I go to CERN I feel a deep sense of reverence. Apart from quick visits over the years, I

Parser to parse search terms and extract valuable information [closed]

阅读更多关于 Parser to parse search terms and extract valuable information [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 6 years ago . I would like to understand the serarh term of a user. Think of someone is searching for "staples in NY" - I would like to understand that its a location search where keyword is staples and location is new york. Similarly if someone types "cat in hat", the parser should not flag that also as a location search,

Using Conditional Random Fields for Named Entity Recognition

阅读更多关于 Using Conditional Random Fields for Named Entity Recognition

问题 What is Conditional Random Field ? How does exactly Conditional Random Field identify proper names as person, organization, or place in a structured or unstructured text? For example: This product is ordered by StackOverFlow Inc. What does Conditional Random Field do to identify StackOverFlow Inc. as an organization? 回答1: A CRF is a discriminative, batch, tagging model, in the same general family as a Maximum Entropy Markov model. A full explanation is book-length. A short explanation is as

Lucene Entity Extraction

阅读更多关于 Lucene Entity Extraction

Given a finite dictionary of entity terms, I'm looking for a way to do Entity Extraction with intelligent tagging using Lucene. Currently I've been able to use Lucene for: - Searching for complex phrases with some fuzzyness - Highlighting results However, I 'm not aware how to: -Get accurate offsets of the matched phrases -Do entity-specific annotaions per match(not just tags for every single hit) I have tried using the explain() method - but this only gives the terms in the query which got the hit - not the offsets of the hit within the original text. Has anybody faced a similar problem and

Parser to parse search terms and extract valuable information [closed]

阅读更多关于 Parser to parse search terms and extract valuable information [closed]

I would like to understand the serarh term of a user. Think of someone is searching for "staples in NY" - I would like to understand that its a location search where keyword is staples and location is new york. Similarly if someone types "cat in hat", the parser should not flag that also as a location search, here the entire keyword is "cat in hat". Is there any algorithm or open source library available to parse a search term and understand its a comparison (like A vs B) or its a location based search (like A in X)? The problem you describe is called information extraction . A host of

Ignore some part of input when parsing with ANTLR

阅读更多关于 Ignore some part of input when parsing with ANTLR

I'm trying to parse a language by ANTLR (ANTLRWorks-3.5.2). The goal is to enter complete input but Antlr gives a parse tree of defined parts in grammar and ignore the rest of inputs, for example this is my grammar : grammar asap; project : '/begin PROJECT' name module+ '/end PROJECT'; module : '/begin MODULE'name '/end MODULE'; name : IDENT ; IDENT : ('a'..'z'|'A'..'Z')('a'..'z'|'A'..'Z'|'0'..'9'|'_'|'.'|':'|'-')*; Given input: /begin PROJECT HybridSailboat_2 /begin MODULE engine /begin A2ML /include XCP_common_v1_0.aml "XCP" struct { taggedstruct Common_Parameters ; }; /end A2ML /end MODULE