Convert MNIST data set from CSV to ubyte format

走远了吗. 提交于 2020-12-15 01:59:39

问题


I'm working with the MNIST data set. I pulled down the original binary files (i.e. -ubyte; 784 columns X 60,000 rows for training data set), and converted them to CSV so I could do some processing on them.

Now I want to convert the CSV files back to ubyte, to upload them to a pipeline I'm testing.

I found this code, but I would have thought converting .csv to ubyte would be a common process, particularly as the MNIST data set is so famous, and I'm wondering am I missing something and if there's a simpler solution that someone knows of (e.g. I was trying to find something in pandas or numpy?)

Edit 1: I tried this:

import sys
output_file = open(sys.argv[2], 'wb')
for line in open(sys.argv[1]):
    output_file.write(line)
output_file.close()

But I got:

  File "4.convert_to_binary.py", line 4, in <module>
    output_file.write(line)
TypeError: a bytes-like object is required, not 'str'

Edit 2: I think it's clear because I said I'm using MNIST data set, but I'm talking about the specific binary type that MNIST uses, as described here

Edit 3: I made a small amount of progress in that I think the format I'm looking for is IDX format here, so now I think my question is clearer: how to convert from CSV to idx binary (assuming this post is right and MNIST uses idx binary).

来源:https://stackoverflow.com/questions/65143325/convert-mnist-data-set-from-csv-to-ubyte-format

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!