Fast Search to see if a String Exists in Large Files with Delphi

前端 未结 6 1881
抹茶落季
抹茶落季 2020-12-14 13:31

I have a FindFile routine in my program which will list files, but if the \"Containing Text\" field is filled in, then it should only list files containing that text.

<
6条回答
  •  余生分开走
    2020-12-14 14:18

    The best approach here is probably to use memory mapped files.

    First you need a file handle, use the CreateFile windows API function for that.

    Then pass that to CreateFileMapping to get a file mapping handle. Finally use MapViewOfFile to map the file into memory.

    To handle large files, MapViewOfFile is able to map only a certain range into memory, so you can e.g. map the first 32MB, then use UnmapViewOfFile to unmap it followed by a MapViewOfFile for the next 32MB and so on. (EDIT: as was pointed out below, make sure that the blocks you map this way overlap by a multiple of 4kb, and at least as much as the length of the text you are searching for, so that you are not overlooking any text which might be split at the block boundary)

    To do the actual searching once the (part of) the file is mapped into memory, you can make a copy of the source for StrPosLen from SysUtils.pas (it's unfortunately defined in the implementation section only and not exposed in the interface). Leave one copy as is and make another copy, replacing Wide with Ansi every time. Also, if you want to be able to search in binary files which might contain embedded #0's, you can remove the (Str1[I] <> #0) and part.

    Either find a way to identify if a file is ANSI or Unicode, or simply call both the Ansi and Unicode version on each mapped part of the file.

    Once you are done with each file, make sure to call CloseHandle first on the file mapping handle and then on the file handling. (And don't forget to call UnmapViewOfFile first).

    EDIT:

    A big advantage of using memory mapped files instead of using e.g. a TFileStream to read the file into memory in blocks is that the bytes will only end up in memory once.

    Normally, on file access, first Windows reads the bytes into the OS file cache. Then copies them from there into the application memory.

    If you use memory mapped files, the OS can directly map the physical pages from the OS file cache into the address space of the application without making another copy (reducing the time needed for making the copy and halfing memory usage).

    Bonus Answer: By calling StrLIComp instead of StrLComp you can do a case insensitive search.

提交回复
热议问题