Mixing categorial and continuous data in Naive Bayes classifier using scikit-learn

后端未结

关注

 3  1353

逝去的感伤 2020-12-02 05:56

I\'m using scikit-learn in Python to develop a classification algorithm to predict the gender of certain customers. Amongst others, I want to use the Naive Bayes classifier

3条回答

北海茫月 (楼主)

2020-12-02 06:15
Hope I'm not too late. I recently wrote a library called Mixed Naive Bayes, written in NumPy. It can assume a mix of Gaussian and categorical (multinoulli) distributions on the training data features.

https://github.com/remykarem/mixed-naive-bayes

The library is written such that the APIs are similar to scikit-learn's.

In the example below, let's assume that the first 2 features are from a categorical distribution and the last 2 are Gaussian. In the fit() method, just specify categorical_features=[0,1], indicating that Columns 0 and 1 are to follow categorical distribution.
```
from mixed_naive_bayes import MixedNB
X = [[0, 0, 180.9, 75.0],
     [1, 1, 165.2, 61.5],
     [2, 1, 166.3, 60.3],
     [1, 1, 173.0, 68.2],
     [0, 2, 178.4, 71.0]]
y = [0, 0, 1, 1, 0]
clf = MixedNB(categorical_features=[0,1])
clf.fit(X,y)
clf.predict(X)
```
Pip installable via pip install mixed-naive-bayes. More information on the usage in the README.md file. Pull requests are greatly appreciated :)
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...