one-hot-encoding

OneHotEncoder Error: cannot convert string to float

纵饮孤独 提交于 2019-12-11 05:48:36
问题 I was wondering if someone could help me with this. I'm learning about Multiple Linear Regression and was trying to do some practice but seem to have hit a problem. I was trying to convert payment_type into a categorical variable using onehotencoder. Here I have the error and the first few rows and columns of the data. I tried looking online and doing things that other people suggested but I kept getting errors from those as well. Is there a way to fix this? I've been trying for hours now.

Keras One Hot Encoding Memory Management - best Possible way out

安稳与你 提交于 2019-12-11 02:43:40
问题 I know this problem has been answered in different ways in the past. But I am not able to figure out and fit in my code and need help. I am using the cornell movie corpus as my dataset. Trying to train a LSTM model for chatbot is the final expectation. But I am stuck with initial one hot encoding and is getting out of memory. Note the VM I am training is 86GB memory but still having issues.In nmt_special_utils_mod.py the one hot encoding is going beyond allocated memory and I am not able to

Pandas One hot encoding: Bundling together less frequent categories

喜你入骨 提交于 2019-12-10 23:13:18
问题 I'm doing one hot encoding over a categorical column which has some 18 different kind of values. I want to create new columns for only those values, which appear more than some threshold (let's say 1%), and create another column named other values which has 1 if value is other than those frequent values. I'm using Pandas with Sci-kit learn. I've explored pandas get_dummies and sci-kit learn's one hot encoder , but can't figure out how to bundle together less frequent values into one column.

sklearn mask for onehotencoder does not work

社会主义新天地 提交于 2019-12-10 15:29:04
问题 Considering data like: from sklearn.preprocessing import OneHotEncoder import numpy as np dt = 'object, i4, i4' d = np.array([('aaa', 1, 1), ('bbb', 2, 2)], dtype=dt) I want to exclude the text column using the OHE functionality. Why does the following not work? ohe = OneHotEncoder(categorical_features=np.array([False,True,True], dtype=bool)) ohe.fit(d) ValueError: could not convert string to float: 'bbb' It says in the documentation: categorical_features: “all” or array of indices or mask :

How to consistently hot encode dataframes with changing values?

谁说我不能喝 提交于 2019-12-10 11:20:59
问题 I'm getting a stream of content in the form of dataframes, each batch with different values in columns. For example one batch might look like: day1_data = {'state': ['MS', 'OK', 'VA', 'NJ', 'NM'], 'city': ['C', 'B', 'G', 'Z', 'F'], 'age': [27, 19, 63, 40, 93]} and another one like: day2_data = {'state': ['AL', 'WY', 'VA'], 'city': ['A', 'B', 'E'], 'age': [42, 52, 73]} how can the columns be hot encoded in a way that returns a consistent number of columns? If I use pandas's get_dummies() on

How to do pd.get_dummies or other ways?

老子叫甜甜 提交于 2019-12-09 19:04:53
问题 Actually,My problem is based on the : Is there a faster way to update dataframe column values based on conditions? So,the data should be: import pandas as pd import io t=""" AV4MdG6Ihowv-SKBN_nB DTP,FOOD AV4Mc2vNhowv-SKBN_Rn Cash 1,FOOD AV4MeisikOpWpLdepWy6 DTP,Bar AV4MeRh6howv-SKBOBOn Cash 1,FOOD AV4Mezwchowv-SKBOB_S DTOT,Bar AV4MeB7yhowv-SKBOA5b DTP,Bar """ data_vec=pd.read_csv(io.StringIO(t),sep='\s{2,}',names=['id','source']) data_vec This is the data_vec: id source 0 AV4MdG6Ihowv-SKBN_nB

OneHotEncoding Mapping

别等时光非礼了梦想. 提交于 2019-12-09 01:49:03
问题 To discretize categorical features I'm using a LabelEncoder and OneHotEncoder. I know that LabelEncoder maps data alphabetically, but how does OneHotEncoder map data? I have a pandas dataframe, dataFeat with 5 different columns, and 4 possible labels, like above. dataFeat = data[['Feat1', 'Feat2', 'Feat3', 'Feat4', 'Feat5']] Feat1 Feat2 Feat3 Feat4 Feat5 A B A A A B B C C C D D A A B C C A A A I apply a labelencoder like this, le = preprocessing.LabelEncoder() intIndexed = dataFeat.apply(le

Exporting Tensorflow prediction to csv but the result contains all zeros - Is this because of one-hot ending ?

有些话、适合烂在心里 提交于 2019-12-08 06:28:07
问题 I am using Tensorflow framework for my classification predictions. My dataset contains around 1160 output classes. The output class values are 6 digit number. For example, 789954. After training and testing the dataset with Tensorflow, I got the accuracy of around 99%. Now the second step is to get the prediction outcome in the csv file so that I can check the predicted outcomes(logits) match with original labels in the set. We know that logits are one hot encoded vectors for my . So, I have

Mixing numerical and categorical data into keras sequential model with Dense layers

安稳与你 提交于 2019-12-06 15:51:32
I have a training set in a Pandas dataframe, and I pass this data frame into model.fit() with df.values . Here is some information about the df: df.values.shape # (981, 5) df.values[0] # array([163, 0.6, 83, 0.52, # array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, # 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, # 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, # 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, # 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, # 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

How to consistently hot encode dataframes with changing values?

↘锁芯ラ 提交于 2019-12-06 15:04:17
I'm getting a stream of content in the form of dataframes, each batch with different values in columns. For example one batch might look like: day1_data = {'state': ['MS', 'OK', 'VA', 'NJ', 'NM'], 'city': ['C', 'B', 'G', 'Z', 'F'], 'age': [27, 19, 63, 40, 93]} and another one like: day2_data = {'state': ['AL', 'WY', 'VA'], 'city': ['A', 'B', 'E'], 'age': [42, 52, 73]} how can the columns be hot encoded in a way that returns a consistent number of columns? If I use pandas's get_dummies() on each of the batches, it returns a different number of columns: df1 = pd.get_dummies(pd.DataFrame(day1