nlp | 易学教程

Run GATE pipeline from inside a Java program without the GUI. build a tomcat app with gate

阅读更多关于 Run GATE pipeline from inside a Java program without the GUI. build a tomcat app with gate

问题 i have built some plugin components to GATE and in combination with ANNIE tools, im running a pipeline in GATE platform. Does anyone know how can i run a pipeline from the console? I want to build a web application in Tomcat that will be taking a plain text from the web page, passing it to the GATE pipeline i have built and do something. So i need to run GATE in a simple Java file, how can it be done? Thanks in advance and sorry for my poor grammar 回答1: The GATE example code shows you how to

Natural Language Processing Solution in Java? [duplicate]

阅读更多关于 Natural Language Processing Solution in Java? [duplicate]

问题 This question already has answers here : Is there a good natural language processing library [closed] (3 answers) Closed 5 years ago . Are there any equally great packages like Python's NTLK in Java world ? 回答1: Two popular ones that I know of are: Gate OpenNLP 回答2: Also LingPipe is really nice. 回答3: Stanford has a very good collection of NLP tools. 回答4: For other JVM languages see Scala: Scala NLP Clojure: clojure-opennlp 回答5: ClearTK provides a Java framework for doing statistical NLP. Its

How to add new embeddings for unknown words in Tensorflow (training & pre-set for testing)

阅读更多关于 How to add new embeddings for unknown words in Tensorflow (training & pre-set for testing)

问题 I am curious as to how I can add a normal-randomized 300 dimension vector (elements' type = tf.float32) whenever a word unknown to the pre-trained vocabulary is encountered. I am using pre-trained GloVe word embeddings, but in some cases, I realize I encounter unknown words, and I want to create a normal-randomized word vector for this new found unknown word. The problem is that with my current set up, I use tf.contrib.lookup.index_table_from_tensor to convert from words to integers based on

Any tools to programmatically convert Japanese sentence into its romaji (phonetical reading)? [closed]

阅读更多关于 Any tools to programmatically convert Japanese sentence into its romaji (phonetical reading)? [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 7 years ago . Input: 日本が好きです. Output: Nippon ga sukidesu. Phonetical reading is unfortunately not available through Google Translate API. 回答1: KAKASI is a good, simple tool for what you want to do: % echo "日本が好きです。" | iconv -f utf8 -t eucjp | kakasi -i euc -Ha -Ka -Ja -Ea -ka nippongasukidesu. % echo "日本が好きです。" | iconv -f

Methods for automated synonym detection

阅读更多关于 Methods for automated synonym detection

问题 I am currently working on a neural network based approach to short document classification, and since the corpuses I am working with are usually around ten words, the standard statistical document classification methods are of limited use. Due to this fact I am attempting to implement some form of automated synonym detection for the matches provided in the training. My question more specifically is about resolving a situation as follows: Say I have classifications of "Involving Food", and one

What does a weighted word embedding mean?

阅读更多关于 What does a weighted word embedding mean?

问题 In the paper that I am trying to implement, it says, In this work, tweets were modeled using three types of text representation. The first one is a bag-of-words model weighted by tf-idf (term frequency - inverse document frequency) (Section 2.1.1). The second represents a sentence by averaging the word embeddings of all words (in the sentence) and the third represents a sentence by averaging the weighted word embeddings of all words, the weight of a word is given by tf-idf (Section 2.1.2). I

What does a weighted word embedding mean?

阅读更多关于 What does a weighted word embedding mean?

implementing a perceptron classifier

阅读更多关于 implementing a perceptron classifier

问题 Hi I'm pretty new to Python and to NLP. I need to implement a perceptron classifier. I searched through some websites but didn't find enough information. For now I have a number of documents which I grouped according to category(sports, entertainment etc). I also have a list of the most used words in these documents along with their frequencies. On a particular website there was stated that I must have some sort of a decision function accepting arguments x and w. x apparently is some sort of

Unstructured Text to Structured Data

阅读更多关于 Unstructured Text to Structured Data

问题 I am looking for references (tutorials, books, academic literature) concerning structuring unstructured text in a manner similar to the google calendar quick add button. I understand this may come under the NLP category, but I am interested only in the process of going from something like "Levi jeans size 32 A0b293" to: Brand: Levi, Size: 32, Category: Jeans, code: A0b293 I imagine it would be some combination of lexical parsing and machine learning techniques. I am rather language agnostic

NLP中文句子类型判别和分类实现

阅读更多关于 NLP中文句子类型判别和分类实现

目录一、中文句子类型主要类别 1、陈述句（statement） 2、特殊句（special） 3、疑问句（question）二、中文句子类型简单分析三、将句法分析与正则结合标注句子类型四、句子类型调研及规则总结五、中文句子类型分类工具sentypes实现一、中文句子类型主要类别 1、陈述句（statement）主语为首（subject_front），例：大家对这件事都很热心主题为首（theme_front），例：红绿灯，真好玩复合句（complex），例：他们飞的好高好远，穿过白云，越过海洋 2、特殊句（special）把字句（ba_struct），例：阳光把冷冷的冬天赶走了被字句（bei_struct），例：衣服被雨淋湿了存在句（exist），例：门口有两头狮子感叹句（sigh），例：真谢谢你！祈使句（Imperative），例：小心！连字句（lian_struct），例：我不但眼睛不舒服，好像连耳朵也优点疼是字句（shi_struct），例：我的爸爸是老师比较句（compare），例：我的力气比你大 3、疑问句（question）疑问词问句（特指问句）（question_words），例：你什么时候回来是非问句（whether），例：你今天会准时下课吗选择问句（choice），例：他是坐火车来的，还是坐汽车来的正反问句（pos_and