one-hot-encoding | 易学教程

Encoding Categorical Variables like “State Names”

阅读更多关于 Encoding Categorical Variables like “State Names”

来源： https://stackoverflow.com/questions/59716391/encoding-categorical-variables-like-state-names

One-hot-encoding multiple columns in sklearn and naming columns

阅读更多关于 One-hot-encoding multiple columns in sklearn and naming columns

问题 I have the following code to one-hot-encode 2 columns I have. # encode city labels using one-hot encoding scheme city_ohe = OneHotEncoder(categories='auto') city_feature_arr = city_ohe.fit_transform(df[['city']]).toarray() city_feature_labels = city_ohe.categories_ city_features = pd.DataFrame(city_feature_arr, columns=city_feature_labels) phone_ohe = OneHotEncoder(categories='auto') phone_feature_arr = phone_ohe.fit_transform(df[['phone']]).toarray() phone_feature_labels = phone_ohe

One-hot-encoding multiple columns in sklearn and naming columns

阅读更多关于 One-hot-encoding multiple columns in sklearn and naming columns

Feature names from OneHotEncoder

阅读更多关于 Feature names from OneHotEncoder

问题 I am using OneHotEncoder to encode few categorical variables (eg - Sex and AgeGroup). The resulting feature names from the encoder are like - 'x0_female', 'x0_male', 'x1_0.0', 'x1_15.0' etc. >>> train_X = pd.DataFrame({'Sex':['male', 'female']*3, 'AgeGroup':[0,15,30,45,60,75]}) >>> from sklearn.preprocessing import OneHotEncoder >>> encoder = OneHotEncoder() >>> train_X_encoded = encoder.fit_transform(train_X[['Sex', 'AgeGroup']]) >>> encoder.get_feature_names() >>> array(['x0_female', 'x0

Feature names from OneHotEncoder

阅读更多关于 Feature names from OneHotEncoder

SciKit-Learn Label Encoder resulting in error 'argument must be a string or number'

阅读更多关于 SciKit-Learn Label Encoder resulting in error 'argument must be a string or number'

问题 I'm a bit confused - creating an ML model here. I'm at the step where I'm trying to take categorical features from a "large" dataframe (180 columns) and one-hot them so that I can find the correlation between the features and select the "best" features. Here is my code: # import labelencoder from sklearn.preprocessing import LabelEncoder # instantiate labelencoder object le = LabelEncoder() # apply le on categorical feature columns df = df.apply(lambda col: le.fit_transform(col)) df.head(10)

Pandas get_dummies to create one hot with separator = ' ' and with character level separation [duplicate]

阅读更多关于 Pandas get_dummies to create one hot with separator = ' ' and with character level separation [duplicate]

问题 This question already has an answer here : Quickest way to make a get_dummies type dataframe from a column with a multiple of strings (1 answer) Closed 2 years ago . df = pd.DataFrame(["c", "b", "a p", NaN, "ap"]) df[0].str.get_dummies(' ') The above code prints something like this. a p b c ap 0 0 0 0 1 0 1 0 0 1 0 0 2 1 1 0 0 0 3 0 0 0 0 0 4 0 0 0 0 1 The required output is the following: a p b c 0 0 0 0 1 1 0 0 1 0 2 1 1 0 0 3 0 0 0 0 4 1 1 0 0 I am sure it's bit tricky. Any help is

one hot encode each column in a Int matrix in R

阅读更多关于 one hot encode each column in a Int matrix in R

问题 I have an issue of translating matrix into one hot encoding in R. I implemented in Matlab but i have difficulty in handling the object in R. Here i have an object of type 'matrix'. I would like to apply one hot encoding to this matrix. I have problem with column names. here is an example: > set.seed(4) > t <- matrix(floor(runif(10, 1,9)),5,5) [,1] [,2] [,3] [,4] [,5] [1,] 5 3 5 3 5 [2,] 1 6 1 6 1 [3,] 3 8 3 8 3 [4,] 3 8 3 8 3 [5,] 7 1 7 1 7 > class(t) [1] "matrix" Expecting: 1_1 1_3 1_5 1_7 2

How can I one hot encode a list of strings with Keras?

阅读更多关于 How can I one hot encode a list of strings with Keras?

问题 I have a list: code = ['<s>', 'are', 'defined', 'in', 'the', '"editable', 'parameters"', '\n', 'section.', '\n', 'A', 'larger', '`tsteps`', 'value', 'means', 'that', 'the', 'LSTM', 'will', 'need', 'more', 'memory', '\n', 'to', 'figure', 'out'] And I want to convert to one hot encoding. I tried: to_categorical(code) And I get an error: ValueError: invalid literal for int() with base 10: '<s>' What am I doing wrong? 回答1: keras only supports one-hot-encoding for data that has already been

PySpark: Output of OneHotEncoder looks odd [duplicate]

阅读更多关于 PySpark: Output of OneHotEncoder looks odd [duplicate]

问题 This question already has an answer here : Spark ML VectorAssembler returns strange output (1 answer) Closed 2 years ago . The Spark documentation contains a PySpark example for its OneHotEncoder : from pyspark.ml.feature import OneHotEncoder, StringIndexer df = spark.createDataFrame([ (0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, "a"), (5, "c") ], ["id", "category"]) stringIndexer = StringIndexer(inputCol="category", outputCol="categoryIndex") model = stringIndexer.fit(df) indexed = model