Reshape and write ImageDataGenerator output to CSV file

问题

I'm working with the MNIST data set. I have the training data vectors in one CSV file (i.e. 60,000 rows, each with 784 columns), and the labels in a separate CSV file.

I want to bulk up the amount of training data, and append it to the CSV. It has to be done like this, because then the CSV file has to be fed in to a separate pipeline.

I originally wrote this script:

import keras
from keras.preprocessing.image import ImageDataGenerator
import pandas as pd

X_train = pd.read_csv('train-images-idx3-ubyte.csv')


datagen = ImageDataGenerator(
        featurewise_center=False,  
        samplewise_center=False,  
        featurewise_std_normalization=False,  
        samplewise_std_normalization=False, 
        zca_whitening=False,  
        rotation_range=10,  
        zoom_range = 0.2, 
        width_shift_range=0.2,  
        height_shift_range=0.2,  
        horizontal_flip=False,  
        vertical_flip=False) 


datagen.fit(X_train)

And I got the error:

ValueError: Input to `.fit()` should have rank 4. Got array with shape: (59999, 784)

So then I reshaped the data, and ran it again:

import keras
from keras.preprocessing.image import ImageDataGenerator
import pandas as pd

X_train = pd.read_csv('train-images-idx3-ubyte.csv')
X_train = X_train.values.reshape(-1,28,28,1)

datagen = ImageDataGenerator(
        featurewise_center=False,  
        samplewise_center=False,  
        featurewise_std_normalization=False, 
        samplewise_std_normalization=False,
        zca_whitening=False,  
        rotation_range=10, 
        zoom_range = 0.2, 
        width_shift_range=0.2,  
        height_shift_range=0.2,  
        horizontal_flip=False, 
        vertical_flip=False)  


datagen.fit(X_train)

But now I'm stuck, how do I (1) reshape the data back to it's original format, and (2) append the extra output to a CSV file/write to a new CSV file, so the output looks exactly the same as the input (i.e. 784 columns) but just with extra rows added.

When I change the last line from:

datagen.fit(X_train)

To:

output = datagen.fit(X_train)
print(output[0])

The error is:

    print(output[0])
TypeError: 'NoneType' object is not subscriptable

So I can't really understand how specifically to do it, if someone could show me the code I'd appreciate it.

Just to note that this data needs to eventually be put back into the MNIST-specific binary format.

Edit 1: I've just added the tensorflow tag because I know the two are closely linked, and if there's a better method in tensorflow for this purpose that would be great either.

来源：https://stackoverflow.com/questions/65142970/reshape-and-write-imagedatagenerator-output-to-csv-file

标签

python

tensorflow

machine-learning

keras

mnist