I often want to bucket an unordered collection in python. itertools.groubpy does the right sort of thing but almost always requires massaging to sort the items first and cat
If its a pandas.DataFrame
the following also works, utilizing pd.cut()
from sklearn import datasets
import pandas as pd
# import some data to play with
iris = datasets.load_iris()
df_data = pd.DataFrame(iris.data[:,0]) # we'll just take the first feature
# bucketize
n_bins = 5
feature_name = iris.feature_names[0].replace(" ", "_")
my_labels = [str(feature_name) + "_" + str(num) for num in range(0,n_bins)]
pd.cut(df_data[0], bins=n_bins, labels=my_labels)
yielding
0 0_1
1 0_0
2 0_0
[...]
In case you don't set the labels
, the output is going to like this
0 (5.02, 5.74]
1 (4.296, 5.02]
2 (4.296, 5.02]
[...]