fetch_mldata: how to manually set up MNIST dataset when source server is down?

前端 未结 1 557
清歌不尽
清歌不尽 2020-12-18 08:42

I need to run a code that contains these lines:

from sklearn.datasets import fetch_mldata
mnist = fetch_mldata(\'MNIST original\')

There se

相关标签:
1条回答
  • 2020-12-18 09:20

    fetch_mldata will by default check the data in `'~/scikit_learn_data/mldata' to see if the dataset is already downloaded or not.

    According to source code

        # if the file does not exist, download it
        if not exists(filename):
            urlname = MLDATA_BASE_URL % quote(dataname)
    

    So in your case, it will check the location

    ~/scikit_learn_data/mldata/mnist-original.mat
    

    and if not found, it will download from

    http://mldata.org/repository/data/download/matlab/mnist-original.mat
    

    which currently is down as you suspected.

    So what you can do is download the dataset from any other location like this:

    https://github.com/amplab/datascience-sp14/blob/master/lab7/mldata/mnist-original.mat
    

    and keep that in the above folder.

    After that when you run fetch_mldata() it should pick the downloaded dataset without connecting mldata.org.

    Update:

    Here ~ refers to the user home folder. You can use the following code to know the default location of that folder according to your system.

    from sklearn.datasets import get_data_home
    print(get_data_home())
    
    0 讨论(0)
提交回复
热议问题