I have a panda dataframe. There is one column, let\'s name it: \'col\' Each entry of this column is a list of words. [\'word1\', \'word2\', etc.]
How can I efficient
You can use apply from pandas with a function to lemmatize each words in the given string. Note that there are many ways to tokenize your text. You might have to remove symbols like . if you use whitespace tokenizer.
Below, I give an example on how to lemmatize a column of example dataframe.
import nltk
w_tokenizer = nltk.tokenize.WhitespaceTokenizer()
lemmatizer = nltk.stem.WordNetLemmatizer()
def lemmatize_text(text):
return [lemmatizer.lemmatize(w) for w in w_tokenizer.tokenize(text)]
df = pd.DataFrame(['this was cheesy', 'she likes these books', 'wow this is great'], columns=['text'])
df['text_lemmatized'] = df.text.apply(lemmatize_text)