Tips for managing a large number of files?

后端 未结 6 1846
走了就别回头了
走了就别回头了 2020-12-13 21:45

There are some very good questions here on SO about file management and storing within a large project.

Storing Images in DB - Yea or Nay?
Would

相关标签:
6条回答
  • 2020-12-13 22:02

    I've ran into this problem some time ago for a website that was hosting a lot of files. What we did was take a GUID (which is also the Primary Key field of a file) (e.g. BCC46E3F-2F7A-42b1-92CE-DBD6EC6D6301) and store a file like this: /B/C/C/BCC46E3F-2F7A-42b1-92CE-DBD6EC6D6301/filename.ext

    This has certain advantages:

    • You can scale out the file servers over multiple servers (and assign specific directories to each one)
    • You don't have to rename the file
    • Your directories are guaranteed to be unique

    Hope this helps!

    0 讨论(0)
  • 2020-12-13 22:02

    I usually take this approach:

    Have a global settings variable for your application that points to the folder where you store uploaded files. In your database store the relative paths to the files (relative to what the settings variable points to).

    So if a file is located at /www/uploads/image.jpg, your settings varible points to /www/uploads your database row has image.jpg. This is a flexible way that decouples your systems directory structure from your application.

    Further you can fragment file storage in directories based on what database tables these relate to. Say you have a table user_reports and a table user_photos. You store the files that relate to user_reports in /www/uploads/user_reports. If you have large number of user uploads you can implement fragmentaion even further. Say a user uploads a file on 20.03.2009, the file is called report.pdf, so you store it at /www/uploads/user_reports/2009/03/20/report.pdf.

    0 讨论(0)
  • 2020-12-13 22:08

    In order to avoid creating an excessive number of entries in a single directory, you may want to base creating directories on pieces of the filename. So for instance, if you have a file named d7f5ae9b7c5a.png, you may want to store it in media/d7/f5/d7f5ae9b7c5a.png. If your filenames are all hexadecimal then this will restrict the number of entries in a single directory to 256 up until the final level.

    0 讨论(0)
  • 2020-12-13 22:20

    I can't say much about how apache and PHP manage files, but I can say something about the ext3 file system. ext3 does not seem to have problems with large numbers of files in the same directory. I've tested it with up to a million files. Make sure the dir_index option is enabled on the file system before creating the directories. You can check by running dump2fs and change this option by running tune2fs. Hashing the files into a tree of subdirectories can still be useful because command line tools can still have problems listing the contents of the directory.

    0 讨论(0)
  • 2020-12-13 22:22
    1. One user image ~ 100kb, so let have 10 000 users in database, each user will have in average 5 images, so we will have 5 terabytes DB, and each image output will be executed via a DB and this extra DB traffic will reduce the general DB server perfomance. ... you may use the DB cluster to avoid this, but suppose it is expensive

    2. User report about error on live database, (on test - all works correctly), how would you create dump an unpack it on developers machine? How much time it will take?

    3. In one moment you can decide to put images on some CDN, what will be the changes in your source code?

    0 讨论(0)
  • 2020-12-13 22:23

    One way is to assign a unique number to each file and use it to look up the actual file location. Then you an use that number to distribute files in different directories in the filesystem. For example you could use something like this scheme:

    /images/{0}/{1}/{2}

    {0}: file_number % 100
    {1}: (file_number / 100) % 100
    {2}: file_number

    0 讨论(0)
提交回复
热议问题