How do you deal with lots of small files?

前端 未结 14 1766
慢半拍i
慢半拍i 2020-12-08 04:48

A product that I am working on collects several thousand readings a day and stores them as 64k binary files on a NTFS partition (Windows XP). After a year in production the

相关标签:
14条回答
  • 2020-12-08 05:07

    I have seen vast improvements in the past from splitting the files up into a nested hierarchy of directories by, e.g., first then second letter of filename; then each directory does not contain an excessive number of files. Manipulating the whole database is still slow, however.

    0 讨论(0)
  • 2020-12-08 05:08

    Having hundreds of thousands of files in a single directory will indeed cripple NTFS, and there is not really much you can do about that. You should reconsider storing the data in a more practical format, like one big tarball or in a database.

    If you really need a separate file for each reading, you should sort them into several sub directories instead of having all of them in the same directory. You can do this by creating a hierarchy of directories and put the files in different ones depending on the file name. This way you can still store and load your files knowing just the file name.

    The method we use is to take the last few letters of the file name, reversing them, and creating one letter directories from that. Consider the following files for example:

    1.xml
    24.xml
    12331.xml
    2304252.xml
    

    you can sort them into directories like so:

    data/1.xml
    data/24.xml
    data/1/3/3/12331.xml
    data/2/5/2/4/0/2304252.xml
    

    This scheme will ensure that you will never have more than 100 files in each directory.

    0 讨论(0)
  • 2020-12-08 05:11

    NTFS actually will perform fine with many more than 10,000 files in a directory as long as you tell it to stop creating alternative file names compatible with 16 bit Windows platforms. By default NTFS automatically creates an '8 dot 3' file name for every file that is created. This becomes a problem when there are many files in a directory because Windows looks at the files in the directory to make sure the name they are creating isn't already in use. You can disable '8 dot 3' naming by setting the NtfsDisable8dot3NameCreation registry value to 1. The value is found in the HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\FileSystem registry path. It is safe to make this change as '8 dot 3' name files are only required by programs written for very old versions of Windows.

    A reboot is required before this setting will take effect.

    0 讨论(0)
  • 2020-12-08 05:12

    The performance issue is being caused by the huge amount of files in a single directory: once you eliminate that, you should be fine. This isn't a NTFS-specific problem: in fact, it's commonly encountered with user home/mail files on large UNIX systems.

    One obvious way to resolve this issue, is moving the files to folders with a name based on the file name. Assuming all your files have file names of similar length, e.g. ABCDEFGHI.db, ABCEFGHIJ.db, etc, create a directory structure like this:

    ABC\
        DEF\
            ABCDEFGHI.db
        EFG\
            ABCEFGHIJ.db
    

    Using this structure, you can quickly locate a file based on its name. If the file names have variable lengths, pick a maximum length, and prepend zeroes (or any other character) in order to determine the directory the file belongs in.

    0 讨论(0)
  • 2020-12-08 05:12

    Rename the folder each day with a time stamp.

    If the application is saving the files into c:\Readings, then set up a scheduled task to rename Reading at midnight and create a new empty folder.

    Then you will get one folder for each day, each containing several thousand files.

    You can extend the method further to group by month. For example, C:\Reading become c:\Archive\September\22.

    You have to be careful with your timing to ensure you are not trying to rename the folder while the product is saving to it.

    0 讨论(0)
  • 2020-12-08 05:17

    One common trick is to simply create a handful of subdirectories and divvy up the files.

    For instance, Doxygen, an automated code documentation program which can produce tons of html pages, has an option for creating a two-level deep directory hierarchy. The files are then evenly distributed across the bottom directories.

    0 讨论(0)
提交回复
热议问题