I\'m currently working on a case study for which I need to work on the MNIST database.
The files in this site are said to be in IDX file format. I tried to take a look at th
Following up on the darch
(not ~Darch
~) package mentioned above:
The package is called darch
. It has been moved to MRAN (Microsoft R Application Network) but is available on CRAN as well.
It provides two functions for the MNIST data:
readMNIST
which reads the ubyte files stored in your hard drive and saves them as test.Rdata
and train.Rdata
archives.
provideMNIST
which will download the files and call readMNIST
on them.
When calling these functions you need to give the directory names separated by a single slash e.g. readMNIST("..\MNIST\")
(last slash required).
If you download the files yourself you will need to change the file names: the gz archives contain files with extensions, like t10k-labels.idx1-ubyte but readMNIST
looks for files without extension, like t10k-labels-idx1-ubyte, so you have to change the dot to a dash (with darch
version 0.12.0, maybe they'll fix this).
To load the files in R
you need to use the load
function (e.g. load("..\\MNIST\\test.Rdata")
. This will create the matrices trainData and testData in the environment.
For some reason I did not get any dimnames for the matrices.