How to save synthetic dataset in CSV file using SMOTE

*爱你&永不变心* 提交于 2021-02-11 08:26:30

问题


I am using Credit card data for oversampling using SMOTE. I am using the code written in geeksforgeeks.org (Link)

After running the following code, it states something like that:

print("Before OverSampling, counts of label '1': {}".format(sum(y_train == 1))) 
print("Before OverSampling, counts of label '0': {} \n".format(sum(y_train == 0))) 

# import SMOTE module from imblearn library 
# pip install imblearn (if you don't have imblearn in your system) 
from imblearn.over_sampling import SMOTE 
sm = SMOTE(random_state = 2) 
X_train_res, y_train_res = sm.fit_sample(X_train, y_train.ravel()) 

print('After OverSampling, the shape of train_X: {}'.format(X_train_res.shape)) 
print('After OverSampling, the shape of train_y: {} \n'.format(y_train_res.shape)) 

print("After OverSampling, counts of label '1': {}".format(sum(y_train_res == 1))) 
print("After OverSampling, counts of label '0': {}".format(sum(y_train_res == 0))) 

Output:

Before OverSampling, counts of label '1': 345
Before OverSampling, counts of label '0': 199019 

After OverSampling, the shape of train_X: (398038, 29)
After OverSampling, the shape of train_y: (398038,) 

After OverSampling, counts of label '1': 199019
After OverSampling, counts of label '0': 199019

As I am totally new in this area. I cant understand how to show these data in CSV format. I will be very glad if anyone help me regarding this issue.

Or if there is any reference from where I can make synthetic data from a dataset using SMOTE and save the updated dataset in a CSV file, please mention it.

Something like following image:

Thanks in advance.


回答1:


From what I can see from you code, your X_train_res and others are Python Numpy arrays. You can do something like this:

import numpy as np
import pandas as pd

y_train_res = y_train_res.reshape(-1, 1) # reshaping y_train to (398038,1)
data_res = np.concatenate((X_train_res, y_train_res), axis = 1)
data.savetxt('sample_smote.csv', data_res, delimiter=",")

Cannot run and check it, but let me know if you face any issues.

Note: You will have to do something more to add column labels to it. Let me know once you are through this and need help for that.



来源:https://stackoverflow.com/questions/58654649/how-to-save-synthetic-dataset-in-csv-file-using-smote

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!