Transforming the data stored in tfrecord format to become inputs to a lstm Keras model in Tensorflow and fitting the model with that data

亡梦爱人 提交于 2020-03-05 03:39:42

问题


I have a very long dataframe (25 million rows x 500 columns) which I can access as a csv file or a parquet file but I can load into the RAM of my PC.

The data should be shaped appropriately in order to become input to a Keras LSTM model (Tensorflow 2), given a desired number of timestamps per sample and a desired number of samples per batch.

This is my second post in this subject. I have already been given the advice to convert the data to tfrecord format.

Since my original environment will be PySpark the way to do this transformation would be:

myDataFrame.write.format("tfrecords").option("writeLocality", "local").save("/path") 

How to convert multiple parquet files into TFrecord files using SPARK?

Assuming now that this has been done and to simplify things and make them concrete and reproducible let's assume a dataframe shaped 1000 rows x 3 columns where the first two columns are features and the last is the target, while each row corresponds to a timestamp.

For example the first column is temperature, the second column is wind_speed and the third column (the target) is energy_consumption. Each row corresponds to an hour. The dataset contains observations of 1,000 consecutive hours. We assume that the energy consumption at any given hour is a function of the state of the atmosphere over several hours before. Therefore, we want to use an lstm model to estimate energy consumption. We have decided to feed the lstm model with samples each of which contains the data from the previous 5 hours (i.e. 5 rows per sample). For simplicity assume that the target has been shifted backwards one hour so that a slice data[0:4, :-1] has as target data[3, -1]. Assume as batch_size = 32.

The data are in our hard disk in .tfrecords format. We can not load all the data to our RAM.

How we would go about it? Can you write the code for this toy example?


回答1:


I don't understand the question. This works out of the box with tfrecords:

# this will not load all data into RAM
dataset = tf.data.TFRecordDataset("./path_to_tfrecord.tfrecord")
k = 0
for sample in dataset:
  print(sample.numpy())

to train

model.fit(train_data=dataset)

Can you give a few samples of what gets printed? (With "..."s to shorten stuff if necessary).



来源:https://stackoverflow.com/questions/60126186/transforming-the-data-stored-in-tfrecord-format-to-become-inputs-to-a-lstm-keras

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!