one-hot-encoding

Mixing numerical and categorical data into keras sequential model with Dense layers

瘦欲@ 提交于 2019-12-23 02:03:32
问题 I have a training set in a Pandas dataframe, and I pass this data frame into model.fit() with df.values . Here is some information about the df: df.values.shape # (981, 5) df.values[0] # array([163, 0.6, 83, 0.52, # array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, # 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, # 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, # 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, # 0,

ValueError: Can't handle mix of multilabel-indicator and binary

 ̄綄美尐妖づ 提交于 2019-12-22 09:12:19
问题 I am using Keras with the scikit-learn wrapper. In particular, I want to use GridSearchCV for hyper-parameters optimisation. This is a multi-class problem, i.e. the target variable can have only one label chosen on a set of n classes. For instance, the target variable can be 'Class1', 'Class2' ... 'Classn'. # self._arch creates my model nn = KerasClassifier(build_fn=self._arch, verbose=0) clf = GridSearchCV( nn, param_grid={ ... }, # I use f1 score macro averaged scoring='f1_macro', n_jobs=-1

How to handle One-Hot Encoding in production environment when number of features in Training and Test are different?

£可爱£侵袭症+ 提交于 2019-12-22 08:46:38
问题 While doing certain experiments, we usually train on 70% and test on 33%. But, what happens when your model is in production? The following may occur: Training Set: ----------------------- | Ser |Type Of Car | ----------------------- | 1 | Hatchback | | 2 | Sedan | | 3 | Coupe | | 4 | SUV | ----------------------- After One- Hot Encoding this, this is what we get: ----------------------------------------- | Ser | Hatchback | Sedan | Coupe | SUV | ----------------------------------------- | 1

How to encode categorical features in sklearn?

流过昼夜 提交于 2019-12-21 05:29:10
问题 I have a dataset with 41 features [from 0 to 40 columns], of which 7 are categorical. This categorical set is divided in two subset: A subset of string type(the column-features 1, 2, 3) A subset of int type, in binary form 0 or 1 (the column-features 6, 11, 20, 21) Furthermore the column-features 1, 2 and 3 (of string type) have cardinality 3, 66 and 11 respectively. In this context I have to encode them to use support vector machine algorithm. This is the code that I have: import numpy as np

applying onehotencoder on numpy array

会有一股神秘感。 提交于 2019-12-13 23:00:17
问题 I am applying OneHotEncoder on numpy array. Here's the code print X.shape, test_data.shape #gives 4100, 15) (410, 15) onehotencoder_1 = OneHotEncoder(categorical_features = [0, 3, 4, 5, 6, 8, 9, 11, 12]) X = onehotencoder_1.fit_transform(X).toarray() onehotencoder_2 = OneHotEncoder(categorical_features = [0, 3, 4, 5, 6, 8, 9, 11, 12]) test_data = onehotencoder_2.fit_transform(test_data).toarray() print X.shape, test_data.shape #gives (4100, 46) (410, 43) where both X and test_data are <type

ValueError: Columns must be same length as key

自闭症网瘾萝莉.ら 提交于 2019-12-12 19:24:39
问题 I have a problem running the code below. data is my dataframe. X is the list of columns for train data. And L is a list of categorical features with numeric values. I want to one hot encode my categorical features. So I do as follows. But a "ValueError: Columns must be same length as key" (for the last line) is thrown. And I still don't understand why after long research. def turn_dummy(df, prop): dummies = pd.get_dummies(df[prop], prefix=prop, sparse=True) df.drop(prop, axis=1, inplace=True)

How to go back from ONE-HOT-ENCODED labels to single column using sklearn?

為{幸葍}努か 提交于 2019-12-12 18:37:36
问题 I have predicted some data using model and getting this kind of results [[0 0 0 ... 0 0 1] [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] ... [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 1] [0 0 0 ... 0 0 0]] which are basically one-hot encoded labels of target column. Now I want to go somehow back to a single column of original values. I used these lines to do my encoding. How can I go back to sinle column? le_candidate = LabelEncoder() df['candidate_encoded'] = le_candidate.fit_transform(df.Candidate) candidate

Lambda layer in Keras with keras.backend.one_hot gives TypeError

懵懂的女人 提交于 2019-12-11 22:04:14
问题 I'm trying to train a character level CNN using Keras. I take as input a single word. I have already transformed the words into lists of indices, but when I try to feed it into one_hot , I get a TypeError . >>> X_train[0] array([31, 14, 23, 29, 27, 18, 12, 30, 21, 10, 27, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

implement N-hot encoding in tf.slim

旧城冷巷雨未停 提交于 2019-12-11 08:42:48
问题 How to implement N-hot encoding according to the index of 1 in a tf.int64 ? The input is tensor containing several tf.int64. The N-hot encoding is aimed to replace one-hot encoding in tf.slim. The one_hot encoding is implemented as following: def dense_to_one_hot(labels_dense, num_classes): """Convert class labels from scalars to one-hot vectors.""" num_labels = labels_dense.shape[0] index_offset = numpy.arange(num_labels) * num_classes labels_one_hot = numpy.zeros((num_labels, num_classes))

pandas faster series of lists unrolling for one-hot encoding?

痴心易碎 提交于 2019-12-11 07:01:33
问题 I'm reading from a database that had many array type columns, which pd.read_sql gives me a dataframe with columns that are dtype=object , containing lists. I'd like an efficient way to find which rows have arrays containing some element: s = pd.Series( [[1,2,3], [1,2], [99], None, [88,2]] ) print s .. 0 [1, 2, 3] 1 [1, 2] 2 [99] 3 None 4 [88, 2] 1-hot-encoded feature tables for an ML application and I'd like to end up with tables like: contains_1 contains_2, contains_3 contains_88 0 1 ... 1 1