Fastest approach to search within file contents of a directory

前端 未结 2 1169
野性不改
野性不改 2020-12-10 23:42

I got a directory that contains files for users of a program I have. There are around 70k json files in that directory.

The current search method is using gl

2条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-12-11 00:11

    Depending on whether or not you're using SSD or HDD to store the files answer differs.

    HDD

    In case of HDD the most probable bottleneck isn't PHP but low number of I/O operation HDDs can handle. I would strongly advise to move to SSD or to use RAM disk if it's feasible.

    Let's assume you're not able to move the directory to SSD. It means that you're stuck on HDD which can perform between ~70-~200 IOPS (I/O operation per second, assuming your system doesn't cache files in the directory in RAM). Your best bet is to minimize I/O calls like fstat, filemtime, file_exists etc and focus on operation that read files (file_get_contents() etc.).

    HDD and operating system allow HDD controllers to group I/O operations to get around low IOPS available. For example if two files are close to each other on HDD you can read both or more of them at cost of reading just one of them (I'm simplifying things here, but let's not get into too technical details). So contrary to some beliefs reading multiple files at once (for example using threaded program, xargs etc.) might greatly improve performance.

    Unfortunately this will be only the case if those files are close to each other on physical HDD. If you really want to speed up things you should first consider in what order you're going to read the files using your application as it's crucial for the next step. Once you figured it out you can erase the HDD drive completely (assuming you can do it) and write files to it sequentially in the order you settled on. This should place files side by side and improve effective IOPS when parallel file reads.

    Next you need to go to shell and use program that can process files in parallel - PHP has support for pthreads but don't go down that route. xargs with multiple processes (-P option) might be helpful if you plan to use single threaded application. Read shell_exec() output and process it in your PHP program.

    SSD

    As with HDD parallel processing might help, it would be best however to see your code first as I/O might not be the problem.

提交回复
热议问题