how to use hash function for storing ~4 million images in file system [closed]

青春壹個敷衍的年華 提交于 2019-12-23 05:13:53

问题


I want to store ~1 million images which would be resized into 4 different kinds,so there would be ~4 million images.How should I use hash functions like md5 to evenly and uniquely distribute images in the directory structure?


回答1:


As others have noted, multiple file names can theoretically hash to the same value. That's easily solved by keeping the original filename, in addition to the hash.

In the following, I'm assuming that your one million input files have unique file names.

This example will also put the original and its thumbnails in the same directory. That will make it easy to remove or find files.

First of all, you'll want a method to map a file name to a directory:

// $id = A unique identifier (a filename)
//       It could be useful to make this id the same for the original, 
//       as well as any thumbnails. Your image and variants will all
//       then end up in the same directory.

// $levels_deep = The number of directories deep you want to go.
//                Want more levels? Use a hashing method with a longer
//                output, such as sha1 (40 characters).

function getDir($id, $levels_deep = 32) {
    $file_hash   = md5($id);
    $dirname     = implode("/", str_split(
        substr($file_hash, 0, $levels_deep)
    ));
    return $dirname;
}

Next, you need to write out the files:

function store($dirname, $filename) {
    // The `true` flag here will have `mkdir` create directories recursively.  
    if(!file_exists($dirname) && !mkdir($dirname, 0777, true))
        throw new Exception("Could not create directory " . $dirname);

    return file_put_contents(
        $dirname . "/" . $filename,
        "Contents of example file.\n"
    );
}

Example use:

store(getDir("myfile.jpg", 4), "myfile.jpg");
store(getDir("myfile.jpg", 4), "myfile_large.jpg");
store(getDir("myfile.jpg", 4), "myfile_small.jpg");
store(getDir("myfile.jpg", 4), "myfile_thumb.jpg");
store(getDir("someOtherFile.jpg", 4), "someOtherFile.jpg");

This will store the above mentioned five files at these locations:

/d/0/6/a/myfile_large.jpg
/d/0/6/a/myfile_small.jpg
/d/0/6/a/myfile_thumb.jpg
/d/0/6/a/myfile.jpg
/1/4/4/d/someOtherFile.jpg

I have not looked into the 'randomness' of md5 bits, but it ought to be distributed evenly enough.




回答2:


MD5 doesn't generate always unique values. If it is ok to change the image file to a increasing number then you can save the image like {number}_{variant}.jpg, for example 1_1.jpg, 1_2.jpg, 2_1.jpg and so on.

To make it look a bit more randomly unique, you can convert the increasing number from Base-10 to Base-26. Image 82981_1.jpg would become in that case 4IJF_1.jpg

If you use a database then you can store the original filename in the database, rename the file like above with the corresponding ID of the record. Using the database would also give you a easy way to validate requests and store statistics.




回答3:


MD5 is for check the consistency of a file. It ca be that 2 different pictures has the same hash. So better don't use hash functions. You can name your pictures like that:

Timestamp_Number_1OfThe4Kinds
Example: 123456789_12_3.png

How to get the image name:

function getname($dir, $kindofpicture){
  i=0;
  do{
  $i++;
  $str=$dir.strval(time()).strval($i).$kindofvalue;
  }while(file_exists($str);
  return $str;
}


来源:https://stackoverflow.com/questions/23787785/how-to-use-hash-function-for-storing-4-million-images-in-file-system

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!