How to run naive Bayes from NLTK with Python Pandas?

人盡茶涼 提交于 2020-01-15 12:16:07

问题


I have a csv file with feature (people's names) and label (people's ethnicities). I am able to set up the data frame using Python Pandas, but when I try to link that with NLTK module to run a naive Bayes, I get the following error:

Traceback (most recent call last):
  File "C:\Users\Desktop\file.py", line 19, in <module>
classifier = nbc.train(train_set)
  File "E:\Program Files Extra\Python27\lib\site-packages\nltk\classify\naivebayes.py", line 194, in train
for fname, fval in featureset.items():
AttributeError: 'str' object has no attribute 'items'

Here is my codes:

import pandas as pd
from pandas import DataFrame
import re
import numpy as np
import nltk
from nltk.classify import NaiveBayesClassifier as nbc

data = pd.read_csv("C:\Users\KubiK\Desktop\OddNames_sampleData3.csv")
frame = DataFrame(data)
frame.columns = ["feature", "label"]
feature = frame.feature
label = frame.label

# Extract features. 
featuresets = [(feature, label) for index, (feature, label) in frame.iterrows()]
# Split train and test set
train_set, test_set = featuresets[:400], featuresets[400:]
# Train a classifier
classifier = nbc.train(train_set)
# Test classifier on "Neo"
print classifier.classify(ethnic_features('Silva'))

Sample data:

Name    Ethnicity
J-b'te Letourneau   Scotish
Jane Mc-earthar French
Li Chen Chinese
Amabil?? Bonneau    English
Emma Lef??c French
C., Akeefe  African
D, James Matheson   English

回答1:


This line

featuresets = [(feature, label) for index, (feature, label) in frame.iterrows()]

is choking nbc.train()

featuresets should be of the form [(featureset, label)] , where the featureset variable is a dict (not a str) and label is the known class label for the featureset.

So it should be

featuresets = [(ethnic_features(feature), label) for index, (feature, label) in frame.iterrows()]

Although you didn't include ethnic_features() in your snippet, I hope it returns a dict.



来源:https://stackoverflow.com/questions/29337714/how-to-run-naive-bayes-from-nltk-with-python-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!