How to retain column headers of data frame after Pre-processing in scikit-learn

后端未结

关注

 4  1704

I have a pandas data frame which has some rows and columns. Each column has a header. Now as long as I keep doing data manipulation operations in pandas, my variable headers

相关标签:

4条回答

不思量自难忘°

2020-12-13 13:18
The above answers still do not resolve the main question. There are two implicit assumptions here
1. That all the features of the dataset will be retained which might not be true. E.g. some kind of feature selection function.
2. That all the features will be retained in the same order, again there might be implicit sorting in some feature selection transformations.
There is a "get_support()" method in at least some of the fit and transform functions that save the information on which columns(features) are retained and in what order.

You can check the basics of the function and how to use it here ... Find get_support() function description here

This would be the most preferred and official way to get the information needed here.
0 讨论(0)
发布评论:

提交评论
- 加载中...
难免孤独

2020-12-13 13:21
scikit-learn indeed strips the column headers in most cases, so just add them back on afterward. In your example, with X_imputed as the sklearn.preprocessing output and X_train as the original dataframe, you can put the column headers back on with:
```
X_imputed_df = pd.DataFrame(X_imputed, columns = X_train.columns)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

迷失自我

2020-12-13 13:30

Adapted from part of the intermediate machine learning course on Kaggle:

from sklearn.impute import SimpleImputer

# Imputation
my_imputer = SimpleImputer()
imputed_X = pd.DataFrame(my_imputer.fit_transform(X))

# Imputation removed column names; put them back
imputed_X.columns = X.columns

0 讨论(0)

一整个雨季

2020-12-13 13:32
According to Ami Tavory's reply here, per documentation, Imputer omits empty columns or rows (however you run it).
Thus, before running the Imputer and setting the column names as described above, run something like this (for columns):
```
X_train=X_train.dropna(axis=1, how='all')
```
df.dropna described here.
0 讨论(0)
发布评论:

提交评论
- 加载中...