nltk

How to provide (or generate) tags for nltk lemmatizers

≯℡__Kan透↙ 提交于 2021-02-19 06:14:24
问题 I have a set of documents, and I would like to transform those into such form, that it would allow me to count tfidf for words in those documents (so that each document is being represented by vector of tfidf-numbers). I thought that it is enough to call WordNetLemmatizer.lemmatize(word), and then PorterStemmer - but all 'have', 'has', 'had', etc are not being transformed to 'have' by the lemmatizer - and it goes for other words as well. Then I have read, that I am supposed to provide a hint

How to provide (or generate) tags for nltk lemmatizers

元气小坏坏 提交于 2021-02-19 06:13:48
问题 I have a set of documents, and I would like to transform those into such form, that it would allow me to count tfidf for words in those documents (so that each document is being represented by vector of tfidf-numbers). I thought that it is enough to call WordNetLemmatizer.lemmatize(word), and then PorterStemmer - but all 'have', 'has', 'had', etc are not being transformed to 'have' by the lemmatizer - and it goes for other words as well. Then I have read, that I am supposed to provide a hint

How to provide (or generate) tags for nltk lemmatizers

僤鯓⒐⒋嵵緔 提交于 2021-02-19 06:11:42
问题 I have a set of documents, and I would like to transform those into such form, that it would allow me to count tfidf for words in those documents (so that each document is being represented by vector of tfidf-numbers). I thought that it is enough to call WordNetLemmatizer.lemmatize(word), and then PorterStemmer - but all 'have', 'has', 'had', etc are not being transformed to 'have' by the lemmatizer - and it goes for other words as well. Then I have read, that I am supposed to provide a hint

Passing a pandas dataframe column to an NLTK tokenizer

纵然是瞬间 提交于 2021-02-18 12:59:15
问题 I have a pandas dataframe raw_df with 2 columns, ID and sentences. I need to convert each sentence to a string. The code below produces no errors and says datatype of rule is "object." raw_df['sentences'] = raw_df.sentences.astype(str) raw.df.sentences.dtypes Out: dtype('O') Then, I try to tokenize sentences and get a TypeError that the method is expecting a string or bytes-like object. What am I doing wrong? raw_sentences=tokenizer.tokenize(raw_df) Same TypeError for raw_sentences = nltk

Extracting the person names in the named entity recognition in NLP using Python

限于喜欢 提交于 2021-02-18 12:20:27
问题 I have a sentence for which i need to identify the Person names alone: For example: sentence = "Larry Page is an American business magnate and computer scientist who is the co-founder of Google, alongside Sergey Brin" I have used the below code to identify the NERs. from nltk import word_tokenize, pos_tag, ne_chunk print(ne_chunk(pos_tag(word_tokenize(sentence)))) The output i received was: (S (PERSON Larry/NNP) (ORGANIZATION Page/NNP) is/VBZ an/DT (GPE American/JJ) business/NN magnate/NN and

How to test whether a word is in singular form or not in python?

﹥>﹥吖頭↗ 提交于 2021-02-17 19:14:54
问题 I am trying to get whether a word is in singular form or in plural form by using nltk pos_tag. But the results are not accurate. So, I need a way to find how can get whether a word is in singular form or in plural form? moreover I need it without using any python package. 回答1: For English, every word should somehow have a root lemma where the default plurality is singular. Assuming that you have only nouns in your list, you can try this: from nltk.stem import WordNetLemmatizer wnl =

How to test whether a word is in singular form or not in python?

匆匆过客 提交于 2021-02-17 19:14:16
问题 I am trying to get whether a word is in singular form or in plural form by using nltk pos_tag. But the results are not accurate. So, I need a way to find how can get whether a word is in singular form or in plural form? moreover I need it without using any python package. 回答1: For English, every word should somehow have a root lemma where the default plurality is singular. Assuming that you have only nouns in your list, you can try this: from nltk.stem import WordNetLemmatizer wnl =

Check the similarity between two words with NLTK with Python

﹥>﹥吖頭↗ 提交于 2021-02-17 16:35:38
问题 I have a two lists and I want to check the similarity between each words in the two list and find out the maximum similarity.Here is my code, from nltk.corpus import wordnet list1 = ['Compare', 'require'] list2 = ['choose', 'copy', 'define', 'duplicate', 'find', 'how', 'identify', 'label', 'list', 'listen', 'locate', 'match', 'memorise', 'name', 'observe', 'omit', 'quote', 'read', 'recall', 'recite', 'recognise', 'record', 'relate', 'remember', 'repeat', 'reproduce', 'retell', 'select', 'show

Turning a sentence from first to second person

十年热恋 提交于 2021-02-17 07:06:40
问题 I'm trying to write a script in Python using nltk which changes a sentence from second person to first person. Example: the sentence I went to see Avatar and you came with me should become You went to see Avatar and I came with you Is there a built-in function in nltk that does this? 回答1: There shouldn't be too many forms of personal and possessive pronouns in English. If you create a dictionary of correspondence between 1st and 2nd person forms, you can then tokenize the original sentence

Turning a sentence from first to second person

故事扮演 提交于 2021-02-17 07:06:09
问题 I'm trying to write a script in Python using nltk which changes a sentence from second person to first person. Example: the sentence I went to see Avatar and you came with me should become You went to see Avatar and I came with you Is there a built-in function in nltk that does this? 回答1: There shouldn't be too many forms of personal and possessive pronouns in English. If you create a dictionary of correspondence between 1st and 2nd person forms, you can then tokenize the original sentence