nlp

【学术研究】Notes 2020-1-12-3 清华大学-哈工大学术交流 圆桌会议

别说谁变了你拦得住时间么 提交于 2020-01-12 14:47:01
在研究工作中,如何既利用到BERT的能力又能突出我们的工作贡献?/word2vec, Transformer, ELMo, BERT, XLNet之后,什么是NLP的下一代模型。 zhiyuan liu, 应该想一想,bert何如提出来的,品味问题,bert对这个领域产生的影响。 加入知识。语言作为符号系统,背后的意思利用外部信息完成深度理解。把知识考虑进去是非常重要的一个方面。 一个是unsupervised的大规模数据使用 一个是supervised labeled data的使用。 female. 我们下一步是什么? 可解释性 高效、简化 未来的模型在不同的设备上,大小可调但是性能不影响。 跨媒体,一开始都是在NLP领域,后来会不会有一些解决综合场景的参数和数据上的问题。 知识和规则、如果bert足够优秀,是否可以利用工具,使用高阶规则,生成语料,与合适的词结合起来。 语言各种形式、但是语义是统一的。 yang liu. 02-19年ACL 最佳论文列了一下。 当时根本不知道这个概念 。翻译:为什么做不了这样的事情 大的浪潮一直在变化,如果不处于潮流的最前面,那么就有滞后期。领先NLP的工具方面有很大差距。 不知道现在的时间点上,到底前沿在哪里 阅读文献,看paper。 人工智能的核心,知识从哪来? 人建造:知识图谱 数据承载,在数据中挖掘。 深度学习 alphaGO

Convert plural nouns into singular nouns

给你一囗甜甜゛ 提交于 2020-01-12 14:32:10
问题 How can plural nouns be converted into singular nouns using R? I use the the tagPOS function which tags each text and then extract all of plural nouns which were tagged as "NNS". But what to do in case I want to convert those plural nouns into singular ones.? library("openNLP") library("tm") acq_o <- "Gulf Applied Technologies Inc said it sold its subsidiaries engaged in pipelines and terminal operations for 12.2 mln dlrs. The company said the sale is subject to certain post closing

Convert plural nouns into singular nouns

拜拜、爱过 提交于 2020-01-12 14:25:06
问题 How can plural nouns be converted into singular nouns using R? I use the the tagPOS function which tags each text and then extract all of plural nouns which were tagged as "NNS". But what to do in case I want to convert those plural nouns into singular ones.? library("openNLP") library("tm") acq_o <- "Gulf Applied Technologies Inc said it sold its subsidiaries engaged in pipelines and terminal operations for 12.2 mln dlrs. The company said the sale is subject to certain post closing

Shorten a text and only keep important sentences

谁说胖子不能爱 提交于 2020-01-12 08:44:57
问题 The German website nandoo.net offers the possibility to shorten a news article. If you change the percentage value with a slider, the text changes and some sentences are left out. You can see that in action here: http://www.nandoo.net/read/article/299925/ The news article is on the left side and tags are marked. The slider is on the top of the second column. The more you move the slider to the left, the shorter the text becomes. How can you offer something like that? Are there any algorithms

Most efficient way to index words in a document?

﹥>﹥吖頭↗ 提交于 2020-01-12 05:24:56
问题 This came up in another question but I figured it is best to ask this as a separate question. Give a large list of sentences (order of 100 thousands): [ "This is sentence 1 as an example", "This is sentence 1 as another example", "This is sentence 2", "This is sentence 3 as another example ", "This is sentence 4" ] what is the best way to code the following function? def GetSentences(word1, word2, position): return "" where given two words, word1 , word2 and a position position , the function

How to combine TFIDF features with other features

霸气de小男生 提交于 2020-01-12 04:44:07
问题 I have a classic NLP problem, I have to classify a news as fake or real. I have created two sets of features: A) Bigram Term Frequency-Inverse Document Frequency B) Approximately 20 Features associated to each document obtained using pattern.en (https://www.clips.uantwerpen.be/pages/pattern-en) as subjectivity of the text, polarity, #stopwords, #verbs, #subject, relations grammaticals etc ... Which is the best way to combine the TFIDF features with the other features for a single prediction?

How does spacy use word embeddings for Named Entity Recognition (NER)?

我只是一个虾纸丫 提交于 2020-01-11 18:54:27
问题 I'm trying to train an NER model using spaCy to identify locations, (person) names, and organisations. I'm trying to understand how spaCy recognises entities in text and I've not been able to find an answer. From this issue on Github and this example, it appears that spaCy uses a number of features present in the text such as POS tags, prefixes, suffixes, and other character and word-based features in the text to train an Averaged Perceptron. However, nowhere in the code does it appear that

How does spacy use word embeddings for Named Entity Recognition (NER)?

99封情书 提交于 2020-01-11 18:52:27
问题 I'm trying to train an NER model using spaCy to identify locations, (person) names, and organisations. I'm trying to understand how spaCy recognises entities in text and I've not been able to find an answer. From this issue on Github and this example, it appears that spaCy uses a number of features present in the text such as POS tags, prefixes, suffixes, and other character and word-based features in the text to train an Averaged Perceptron. However, nowhere in the code does it appear that

How to use NLTK to generate sentences from an induced grammar?

余生长醉 提交于 2020-01-11 17:43:54
问题 I have a (large) list of parsed sentences (which were parsed using the Stanford parser), for example, the sentence "Now you can be entertained" has the following tree: (ROOT (S (ADVP (RB Now)) (, ,) (NP (PRP you)) (VP (MD can) (VP (VB be) (VP (VBN entertained)))) (. .))) I am using the set of sentence trees to induce a grammar using nltk: import nltk # ... for each sentence tree t, add its production to allProductions allProductions += t.productions() # Induce the grammar S = nltk.Nonterminal

Text Summarization Evaluation - BLEU vs ROUGE

穿精又带淫゛_ 提交于 2020-01-11 16:37:14
问题 With the results of two different summary systems (sys1 and sys2) and the same reference summaries, I evaluated them with both BLEU and ROUGE. The problem is: All ROUGE scores of sys1 was higher than sys2 (ROUGE-1, ROUGE-2, ROUGE-3, ROUGE-4, ROUGE-L, ROUGE-SU4, ...) but the BLEU score of sys1 was less than the BLEU score of sys2 (quite much). So my question is: Both ROUGE and BLEU are based on n-gram to measure the similar between the summaries of systems and the summaries of human. So why