Understanding DictVectorizer in scikit-learn?

问题

I'm exploring the different feature extracccion classes that scikit-learn provide. Reading the documentation i did not understand very well for what DictVectorizer can be used?. Other questions come to mind for example how can DictVectorizer can be used for text classification?, i.e. how does this class could help handle labeled textual data?. Could anybody provide some little example apart from the example that i all ready read at the documentation web page?

回答1:

say your feature space is length, width and height and you have had 3 observations; i.e. you measure length, width & height of 3 objects:

       length  width  height
obs.1       1      0       2
obs.2       0      1       1
obs.3       3      2       1

another way to show this is to use a list of dictionaries:

[{'height': 1, 'length': 0, 'width': 1},   # obs.2
 {'height': 2, 'length': 1, 'width': 0},   # obs.1
 {'height': 1, 'length': 3, 'width': 2}]   # obs.3

DictVectorizer goes the other way around; i.e given the list of dictionaries builds the top frame:

>>> from sklearn.feature_extraction import DictVectorizer
>>> v = DictVectorizer(sparse=False)
>>> d = [{'height': 1, 'length': 0, 'width': 1},
...      {'height': 2, 'length': 1, 'width': 0},
...      {'height': 1, 'length': 3, 'width': 2}]
>>> v.fit_transform(d)
array([[ 1.,  0.,  1.],   # obs.2
       [ 2.,  1.,  0.],   # obs.1
       [ 1.,  3.,  2.]])  # obs.3
   # height, len., width

来源：https://stackoverflow.com/questions/27473957/understanding-dictvectorizer-in-scikit-learn

标签

python

machine-learning

nlp

scikit-learn

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!