nlp

finding super classes of an entity in SPARQL

无人久伴 提交于 2019-12-24 09:13:09
问题 I want to make a Name Entity Recognizer using wikipedia Data, I need to get all the super classes of a word to see in which category (Place, Human, Organization or None) the word is. I surfed the Internet a lot and find some pages like : SPARQL query to find all sub classes and a super class of a given class which when I execute the query results "No matching records found" even with the word mentioned in the page and trying other namespaces. and: Extracting hierarchy for dbpedia entity using

how to read and write TermDocumentMatrix in r?

自古美人都是妖i 提交于 2019-12-24 06:36:20
问题 I made wordcloud using a csv file in R. I used TermDocumentMatrix method in the tm package. Here is my code: csvData <- read.csv("word", encoding = "UTF-8", stringsAsFactors = FALSE) Encoding(csvData$content) <- "UTF-8" # useSejongDic() - KoNLP package nouns <- sapply(csvData$content, extractNoun, USE.NAMES = F) #create Corpus myCorpus <- Corpus(VectorSource(nouns)) myCorpus <- tm_map(myCorpus, removePunctuation) # remove numbers myCorpus <- tm_map(myCorpus, removeNumbers) #remove StopWord

How to find a word - First letter will be capital & other will be lower

懵懂的女人 提交于 2019-12-24 05:56:20
问题 Problem Statement: Filter those words from the complete set of text6, having first letter in upper case and all other letters in lower case. Store the result in variable title_words. print the number of words present in title_words. I have tried every possible ways to find the answer but don't know where I am lagging. import nltk from nltk.book import text6 title_words = 0 for item in set(text6): if item[0].isupper() and item[1:].islower(): title_words += 1 print(title_words) I have tried in

What is the accuracy of nltk pos_tagger?

丶灬走出姿态 提交于 2019-12-24 05:30:34
问题 I'm writing a dissertation, and using nltk.pos_tagger in my work. I can't find any information about what the accuracy of this algorithm. Does anybody know where can I find such information? 回答1: NLTK default pos tagger pos_tag is a MaxEnt tagger, see line 82 from https://github.com/nltk/nltk/blob/develop/nltk/tag/init.py from nltk.corpus import brown from nltk.data import load sents = brown.tagged_sents() # test on last 10% of brown corpus. numtest = len(sents) / 10 testsents = sents[numtest

Determine if a list of words is in a sentence?

丶灬走出姿态 提交于 2019-12-24 04:53:08
问题 Is there a way (Pattern or Python or NLTK, etc) to detect of a sentence has a list of words in it. i.e. The cat ran into the hat, box, and house. | The list would be hat, box, and house This could be string processed but we may have more generic lists: i.e. The cat likes to run outside, run inside, or jump up the stairs. | List=run outside, run inside, or jump up the stairs. This could be in the middle of a paragraph or the end of the sentence which further complicates things. I've been

Is it possible to get a natural word after it has been stemmed?

房东的猫 提交于 2019-12-24 04:41:29
问题 I have a word play which after stemming has become plai . Now I want to get play again. Is it possible? I have used Porter's Stemmer. 回答1: Stemmer is able to process artificial non-existing words. Would you like them to be returned as elements of a set of all possible words? How do you know that the word doesn't exist and shouldn't be returned? As an option: find a dictionary of all words and their forms. Find a stem for every of them. Save this projection as a map: ( stem, list of all word

nltk interface to stanford parser [duplicate]

孤街醉人 提交于 2019-12-24 04:37:07
问题 This question already has answers here : Stanford Parser and NLTK (18 answers) Closed 3 years ago . I am getting problems to access Stanford parser through python NLTK (they developed an interface for NLTK) import nltk.tag.stanford Traceback (most recent call last): File "", line 1, in ImportError: No module named stanford 回答1: You can use stanford parser from NLTK. Check this link on how to use it - http://www.nltk.org/api/nltk.tag.html#module-nltk.tag.stanford I guess it isn't problem with

Java library/api which converts language code to language name

扶醉桌前 提交于 2019-12-24 03:49:15
问题 Is there a Java library/api which , given an iso language code, returns the corresponding language name. For example zh-cn should return chinese, en should return english and so on. 回答1: The Java Locale class can do this: new Locale("zh", "cn").getDisplayName(); --> Chinese (China) You just have to parse the language/country names. 回答2: You don't need a library; you can use java.util.Locale for this. Locale locale = new Locale("zh", "cn"); System.out.println(locale.getDisplayLanguage()); This

Matcher is returning some duplicates entry

喜你入骨 提交于 2019-12-24 03:46:04
问题 I want output as ["good customer service","great ambience"] but I am getting ["good customer","good customer service","great ambience"] because pattern is matching with good customer also but this phrase doesn't make any sense. How can I remove these kind of duplicates import spacy from spacy.matcher import Matcher nlp = spacy.load("en_core_web_sm") doc = nlp("good customer service and great ambience") matcher = Matcher(nlp.vocab) # Create a pattern matching two tokens: adjective followed by

Translate unicode emojis to ascii emojis in Python

感情迁移 提交于 2019-12-24 03:45:05
问题 Is there a way to translate unicode emojis to an appropriate ascii emoticon in Python? I know the emoji library which can be used to convert unicode emojis to something like :crying_face:. But what I would need is to convert it to :'( Is there an elegant way to do this without having to translate every possible emoji manually? Another option would be to convert the ascii emojis also to their textual representation, i.e. :'( should become :crying_face:. My intermediate goal is to find a way to