nlp

Reusable version of DKPro Core pipeline

痴心易碎 提交于 2020-01-03 00:27:15
问题 I have set up DKPro Core as a web service to take an input and provide a tokenised output. The service itself is set up as a Jersey resource: @Path("/") public class MyResource { public MyResource() { // Nothing here } @GET public String generate(@QueryParam("q") final String input) { try { final JCasIterable en = iteratePipeline( createReaderDescription(StringReader.class, StringReader.PARAM_DOCUMENT_TEXT, input, StringReader.PARAM_LANGUAGE, "en") ,createEngineDescription(StanfordSegmenter

Unscrambling words in a sentence using Natural Language Generation

╄→гoц情女王★ 提交于 2020-01-03 00:06:39
问题 I have a sentence in English. Now I want to jumble the words up and input that set of words into a program which should unscramble the words according to normal rules of English grammar to output the original sentence. I can vaguely assume it would require Natural Language Generation algorithms. For eg: Sentence: Mary has gone for a walk with her dog. Set of words: {has, for, a, with, her, dog, Mary, gone, walk} The output should be the same sentence. I can assume only the set of words will

Stanford CoreNLP: Use partial existing annotation

烈酒焚心 提交于 2020-01-02 15:25:14
问题 We are trying to use existing tokenzation sentence splitting and named entity tagging while we would like to use Stanford CoreNlp to additionally provide us with part-of-speech tagging lemmatization and parsing Currently, we are trying it the following way: 1) make an annotator for "pos, lemma, parse" Properties pipelineProps = new Properties(); pipelineProps.put("annotators", "pos, lemma, parse"); pipelineProps.setProperty("parse.maxlen", "80"); pipelineProps.setProperty("pos.maxlen", "80");

How to use Stanford LexParser for Chinese text?

白昼怎懂夜的黑 提交于 2020-01-02 09:16:22
问题 I can't seem to get the correct input encoding for Stanford NLP's LexParser. How do I use the Stanford LexParser for Chinese text? I've done the following to download the tool: $ wget http://nlp.stanford.edu/software/stanford-parser-full-2015-04-20.zip $ unzip stanford-parser-full-2015-04-20.zip $ cd stanford-parser-full-2015-04-20/ And my input text is in UTF-8 : $ echo "应有尽有 的 丰富 选择 定 将 为 您 的 旅程 增添 无数 的 赏心 乐事 。" > input.txt $ echo "应有尽有#VV 的#DEC 丰富#JJ 选择#NN 定#VV 将#AD 为#P 您#PN 的#DEG 旅程#NN 增添

Older versions of spaCy throws “KeyError: 'package'” error when trying to install a model

南楼画角 提交于 2020-01-02 08:05:29
问题 I use spaCy 1.6.0 on Ubuntu 14.04.4 LTS x64 with python3.5. To install the English model of spaCy, I tried to run: This gives me the error message: ubun@ner-3:~/NeuroNER-master/src$ python3.5 -m spacy.en.download Downloading parsing model Traceback (most recent call last): File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.5/runpy.py", line 85, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.5/dist-packages/spacy

Split text file at sentence boundary

杀马特。学长 韩版系。学妹 提交于 2020-01-02 08:05:04
问题 I have to process a text file (an e-book). I'd like to process it so that there is one sentence per line (a "newline-separated file", yes?). How would I do this task using sed the UNIX utility? Does it have a symbol for "sentence boundary" like a symbol for "word boundary" (I think the GNU version has that). Please note that the sentence can end in a period, ellipsis, question or exclamation mark, the last two in combination (for example, ?, !, !?, !!!!! are all valid "sentence terminators").

Split text file at sentence boundary

女生的网名这么多〃 提交于 2020-01-02 08:01:51
问题 I have to process a text file (an e-book). I'd like to process it so that there is one sentence per line (a "newline-separated file", yes?). How would I do this task using sed the UNIX utility? Does it have a symbol for "sentence boundary" like a symbol for "word boundary" (I think the GNU version has that). Please note that the sentence can end in a period, ellipsis, question or exclamation mark, the last two in combination (for example, ?, !, !?, !!!!! are all valid "sentence terminators").

Stanford.NLP for .NET not loading models

让人想犯罪 __ 提交于 2020-01-02 07:11:23
问题 I am trying to run the sample code provided here for Stanford.NLP for .NET. I installed the package via Nuget, downloaded the CoreNLP zip archive, and extracted stanford-corenlp-3.7.0-models.jar. After extracting, I located the "models" directory in stanford-corenlp-full-2016-10-31\edu\stanford\nlp\models. Here is the code that I am trying to run: public static void Test1() { // Path to the folder with models extracted from `stanford-corenlp-3.6.0-models.jar` var jarRoot = @"..\..\..\stanford

Understanding LDA Transformed Corpus in Gensim

*爱你&永不变心* 提交于 2020-01-02 06:53:12
问题 I tried to examine the contents of the BOW corpus vs. the LDA[BOW Corpus] (transformed by LDA model trained on that corpus with, say, 35 topics) I found the following output: DOC 1 : [(1522, 1), (2028, 1), (2082, 1), (6202, 1)] LDA 1 : [(29, 0.80571428571428572)] DOC 2 : [(1522, 1), (5364, 1), (6202, 1), (6661, 1), (6983, 1)] LDA 2 : [(29, 0.83809523809523812)] DOC 3 : [(3079, 1), (3395, 1), (4874, 1)] LDA 3 : [(34, 0.75714285714285712)] DOC 4 : [(1482, 1), (2806, 1), (3988, 1)] LDA 4 : [(22,

PyTorch: Relation between Dynamic Computational Graphs - Padding - DataLoader

て烟熏妆下的殇ゞ 提交于 2020-01-02 05:27:06
问题 As far as I understand, the strength of PyTorch is supposed to be that it works with dynamic computational graphs. In the context of NLP, that means that sequences with variable lengths do not necessarily need to be padded to the same length. But, if I want to use PyTorch DataLoader, I need to pad my sequences anyway because the DataLoader only takes tensors - given that me as a total beginner does not want to build some customized collate_fn. Now this makes me wonder - doesn’t this wash away