Tokenizing words into a new column in a pandas dataframe

前端 未结 2 1537
轻奢々
轻奢々 2021-01-28 00:45

I am trying to go through a list of comments collected on a pandas dataframe and tokenize those words and put those words in a new column in the dataframe but I have having an e

2条回答
  •  半阙折子戏
    2021-01-28 01:37

    Your way to apply the lambda function is correct, it is the way you define addwords that doesn't work.

    When you define apwords you define a function not an attribute therefore when you want to apply it, use:

    addwords = lambda x: apwords(x)
    

    And not:

    addwords = lambda x: x.apwords()
    

    If you want to use apwords as an attribute, you would need to define a class that inheritates from string and define apwords as an attribute in this class.

    It is far easier to stay with the function:

    def apwords(words):
        filtered_sentence = []
        words = word_tokenize(words)
        for w in words:
            filtered_sentence.append(w)
        return filtered_sentence
    addwords = lambda x: apwords(x)
    df['words'] = df['complaint'].apply(addwords)
    

提交回复
热议问题