nlp

Run GATE pipeline from inside a Java program without the GUI. build a tomcat app with gate

我只是一个虾纸丫 提交于 2019-12-20 10:55:40
问题 i have built some plugin components to GATE and in combination with ANNIE tools, im running a pipeline in GATE platform. Does anyone know how can i run a pipeline from the console? I want to build a web application in Tomcat that will be taking a plain text from the web page, passing it to the GATE pipeline i have built and do something. So i need to run GATE in a simple Java file, how can it be done? Thanks in advance and sorry for my poor grammar 回答1: The GATE example code shows you how to

Natural Language Processing Solution in Java? [duplicate]

怎甘沉沦 提交于 2019-12-20 10:42:22
问题 This question already has answers here : Is there a good natural language processing library [closed] (3 answers) Closed 5 years ago . Are there any equally great packages like Python's NTLK in Java world ? 回答1: Two popular ones that I know of are: Gate OpenNLP 回答2: Also LingPipe is really nice. 回答3: Stanford has a very good collection of NLP tools. 回答4: For other JVM languages see Scala: Scala NLP Clojure: clojure-opennlp 回答5: ClearTK provides a Java framework for doing statistical NLP. Its

How to add new embeddings for unknown words in Tensorflow (training & pre-set for testing)

最后都变了- 提交于 2019-12-20 10:37:45
问题 I am curious as to how I can add a normal-randomized 300 dimension vector (elements' type = tf.float32) whenever a word unknown to the pre-trained vocabulary is encountered. I am using pre-trained GloVe word embeddings, but in some cases, I realize I encounter unknown words, and I want to create a normal-randomized word vector for this new found unknown word. The problem is that with my current set up, I use tf.contrib.lookup.index_table_from_tensor to convert from words to integers based on

Any tools to programmatically convert Japanese sentence into its romaji (phonetical reading)? [closed]

耗尽温柔 提交于 2019-12-20 10:25:37
问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 7 years ago . Input: 日本が好きです. Output: Nippon ga sukidesu. Phonetical reading is unfortunately not available through Google Translate API. 回答1: KAKASI is a good, simple tool for what you want to do: % echo "日本が好きです。" | iconv -f utf8 -t eucjp | kakasi -i euc -Ha -Ka -Ja -Ea -ka nippongasukidesu. % echo "日本が好きです。" | iconv -f

Methods for automated synonym detection

和自甴很熟 提交于 2019-12-20 10:09:45
问题 I am currently working on a neural network based approach to short document classification, and since the corpuses I am working with are usually around ten words, the standard statistical document classification methods are of limited use. Due to this fact I am attempting to implement some form of automated synonym detection for the matches provided in the training. My question more specifically is about resolving a situation as follows: Say I have classifications of "Involving Food", and one

What does a weighted word embedding mean?

白昼怎懂夜的黑 提交于 2019-12-20 10:08:05
问题 In the paper that I am trying to implement, it says, In this work, tweets were modeled using three types of text representation. The first one is a bag-of-words model weighted by tf-idf (term frequency - inverse document frequency) (Section 2.1.1). The second represents a sentence by averaging the word embeddings of all words (in the sentence) and the third represents a sentence by averaging the weighted word embeddings of all words, the weight of a word is given by tf-idf (Section 2.1.2). I

What does a weighted word embedding mean?

为君一笑 提交于 2019-12-20 10:08:03
问题 In the paper that I am trying to implement, it says, In this work, tweets were modeled using three types of text representation. The first one is a bag-of-words model weighted by tf-idf (term frequency - inverse document frequency) (Section 2.1.1). The second represents a sentence by averaging the word embeddings of all words (in the sentence) and the third represents a sentence by averaging the weighted word embeddings of all words, the weight of a word is given by tf-idf (Section 2.1.2). I

implementing a perceptron classifier

爷,独闯天下 提交于 2019-12-20 10:05:08
问题 Hi I'm pretty new to Python and to NLP. I need to implement a perceptron classifier. I searched through some websites but didn't find enough information. For now I have a number of documents which I grouped according to category(sports, entertainment etc). I also have a list of the most used words in these documents along with their frequencies. On a particular website there was stated that I must have some sort of a decision function accepting arguments x and w. x apparently is some sort of

Unstructured Text to Structured Data

≯℡__Kan透↙ 提交于 2019-12-20 09:45:02
问题 I am looking for references (tutorials, books, academic literature) concerning structuring unstructured text in a manner similar to the google calendar quick add button. I understand this may come under the NLP category, but I am interested only in the process of going from something like "Levi jeans size 32 A0b293" to: Brand: Levi, Size: 32, Category: Jeans, code: A0b293 I imagine it would be some combination of lexical parsing and machine learning techniques. I am rather language agnostic

NLP中文句子类型判别和分类实现

半城伤御伤魂 提交于 2019-12-20 09:03:32
目录 一、中文句子类型主要类别 1、陈述句(statement) 2、特殊句(special) 3、疑问句(question) 二、中文句子类型简单分析 三、将句法分析与正则结合标注句子类型 四、句子类型调研及规则总结 五、中文句子类型分类工具sentypes实现 一、中文句子类型主要类别 1、陈述句(statement) 主语为首(subject_front),例:大家对这件事都很热心 主题为首(theme_front),例:红绿灯,真好玩 复合句(complex),例:他们飞的好高好远,穿过白云,越过海洋 2、特殊句(special) 把字句(ba_struct),例:阳光把冷冷的冬天赶走了 被字句(bei_struct),例:衣服被雨淋湿了 存在句(exist),例:门口有两头狮子 感叹句(sigh),例:真谢谢你! 祈使句(Imperative),例:小心! 连字句(lian_struct),例:我不但眼睛不舒服,好像连耳朵也优点疼 是字句(shi_struct),例:我的爸爸是老师 比较句(compare),例:我的力气比你大 3、疑问句(question) 疑问词问句(特指问句)(question_words),例:你什么时候回来 是非问句(whether),例:你今天会准时下课吗 选择问句(choice),例:他是坐火车来的,还是坐汽车来的 正反问句(pos_and