nlp | 易学教程

nlp概述

阅读更多关于 nlp概述

接触NLP也有好长一段时间了，但是对NLP限于知道，但是对整体没有一个很好的认识。特整理了一下思绪，总结记录下：一、NLP的定义还是按照常规的逻辑来看下定义：NLP(Natural Languange Processing,自然语言处理），方法是应用计算机来处理，理解和应用人类语言，目的是达到人机之间进行交流。分成自然语言理解和自然语言生成两部分。引用一个表来概括下：二、NLP的应用机器翻译：将一种语言翻译成另外一种语言，例如：百度翻译情感分析：挖掘人们的观点，情绪对产品、服务等的态度，来评价服务等。起源博客等社交网络。智能问答：机器人回答输入的问题，一些网站的智能问答。文摘生成：机器准确归纳，总结并产生文本摘要。文本分类：采集各种文章，进行主题分析，从而进行自动分类。例如垃圾邮件等。舆论分析：通过舆论的内容，半段舆论的导向。分析传播路径以及发展规律，对不好的舆论进行有效控制。知识图谱：把知识点通过相互之间的连理连接起来的网络图。三、NLP知识构成自然语言处理作为一门比较热的行业，需要语言学、统计学、机器学习、深度学习及自然语言相关理论知识。句法语义分析：针对目标句子进行分词，词性标注，命名实体识别，句法分析，语义角色识别和多一次消歧等。关键词提取：提取目标文本中的主要信息。主要明确，谁，什么时间，什么原因，干了什么事，有啥结果。主要涉及实体识别

How to install the Python package pyrouge on Microsoft Windows?

阅读更多关于 How to install the Python package pyrouge on Microsoft Windows?

问题 I want to use the python package pyrouge on Microsoft Windows. The package doesn't give any instructions on how to install it on Microsoft Windows. How can I do so? 回答1: The following instructions were tested on Windows 7 SP1 x64 Ultimate and python 3.5 x64 (Anaconda). 1) In the cmd.exe , run pip install pyrouge 2) Download ROUGE-1.5.5 . You may download it from https://github.com/andersjo/pyrouge/tree/master/tools/ROUGE-1.5.5 3) pyrouge comes with a python script named pyrouge_set_rouge_path

RDF representation of sentences

阅读更多关于 RDF representation of sentences

问题 I need to represent sentences in RDF format. In other words "John likes coke" would be automatically represented as: Subject : John Predicate : Likes Object : Coke Does anyone know where I should start? Are there any programs which can do this automatically or would I need to do everything from scratch? 回答1: It looks like you want the typed dependencies of a sentence, e.g. for John likes coke : nsubj(likes-2, John-1) dobj(likes-2, coke-3) I'm not aware of any dependency parser that directly

Using Arabic WordNet for synonyms in python?

阅读更多关于 Using Arabic WordNet for synonyms in python?

问题 I am trying to get the synonyms for arabic words in a sentence If the word is in English it works perfectly, and the results are displayed in Arabic language, I was wondering if its possible to get the synonym of an Arabic word right away without writing it in english first. I tried that but it didn't work & I would prefer without tashkeel انتظار instead of اِنْتِظار from nltk.corpus import wordnet as omw jan = omw.synsets('انتظار ')[0] print(jan) print(jan.lemma_names(lang='arb')) 回答1:

nltk StanfordNERTagger : How to get proper nouns without capitalization

阅读更多关于 nltk StanfordNERTagger : How to get proper nouns without capitalization

问题 I am trying to use the StanfordNERTagger and nltk to extract keywords from a piece of text. docText="John Donk works for POI. Brian Jones wants to meet with Xyz Corp. for measuring POI's Short Term performance Metrics." words = re.split("\W+",docText) stops = set(stopwords.words("english")) #remove stop words from the list words = [w for w in words if w not in stops and len(w) > 2] str = " ".join(words) print str stn = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz') stp =

Calculating context-sensitive text correlation

阅读更多关于 Calculating context-sensitive text correlation

问题 Suppose I want to match address records (or person names or whatever) against each other to merge records that are most likely referring to the same address. Basically, I guess I would like to calculate some kind of correlation between the text values and merge the records if this value is over a certain threshold. Example: "West Lawnmower Drive 54 A" is probably the same as "W. Lawn Mower Dr. 54A" but different from "East Lawnmower Drive 54 A". How would you approach this problem? Would it

Extracting Country Name from Author Affiliations

阅读更多关于 Extracting Country Name from Author Affiliations

问题 I am currently exploring the possibility of extracting country name from Author Affiliations (PubMed Articles) my sample data looks like: Mechanical and Production Engineering Department, National University of Singapore. Cancer Research Campaign Mammalian Cell DNA Repair Group, Department of Zoology, Cambridge, U.K. Cancer Research Campaign Mammalian Cell DNA Repair Group, Department of Zoology, Cambridge, UK. Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, IN 46285.

How can I best determine the correct capitalization for a word?

阅读更多关于 How can I best determine the correct capitalization for a word?

问题 I have a database containing sentences which only contain capitalized letters. The database is technical, containing medical terms, and I want to normalize it so that the capitalization is (close to) what the user expects. What is the best way to achieve this? Is there a freely available dataset I can use to help with the process? 回答1: One way could be to infer capitalization from POS-tagging, for example using the Python Natural Language Toolkit (NLTK): import nltk, re def truecase(text):

Lemmatizing POS tagged words with NLTK?

阅读更多关于 Lemmatizing POS tagged words with NLTK?

问题 I have POS tagged some words with nltk.pos_tag(), so they are given treebank tags. I would like to lemmatize these words using the known POS tags, but I am not sure how. I was looking at Wordnet lemmatizer, but I am not sure how to convert the treebank POS tags to tags accepted by the lemmatizer. How can I perform this conversion simply, or is there a lemmatizer that uses treebank tags? 回答1: The wordnet lemmatizer only knows four parts of speech (ADJ, ADV, NOUN, and VERB) and only the NOUN

finding noun and verb in stanford parser

阅读更多关于 finding noun and verb in stanford parser

问题 I need to find whether a word is verb or noun or it is both For example, the word is "search" it can be both noun and a verb but stanford parser gives NN tag to it.. is there any way that stanford parser will give that "search" is both noun and verb? code that i use now public static String Lemmatize(String word) { WordTag w = new WordTag(word); w.setTag(POSTagWord(word)); Morphology m = new Morphology(); WordLemmaTag wT = m.lemmatize(w); return wT.lemma(); } or should i use any other