问题
I have a site with 2000 pages and I want to iterate through each page to generate a sitemap, using the file_get_html() function and regular expressions.
Obviously this can't be completed in one server-side execution as it will run out of time due to maximum execution time. I guess it needs to perform smaller actions, save the progress to the database and then queue the next task. Any suggestions?
回答1:
When you run it command line there will be no maximum execution time.
You can also use set_time_limit(0); for this if your provider allows manipulation.
I can't tell if your ip-address will get banned - as this depends on the security of the server you send your requests to.
Other solution
You can fetch one (or a few) page(s), and search for new URLs throughout the source code. You can then queue these in a database. Then on the next run, you process the queue.
回答2:
You should consider using a Job queue and worker implementation. I would recommend Gearman or zeromq. Both of these have native php bindings.
回答3:
Use set_time_limit(0). See the PHP Manual for more detailed explanation.
seconds
The maximum execution time, in seconds. If set to zero, no time limit is imposed.
EDIT: As for your second question, it's not likely, however, you should check your hosting services Terms of Use to see if it's allowed.
回答4:
Set max_execution_time to 0 in your php.ini. It will affect every script you run on the server, but if you're looking for a server-level fix, this will do it.
http://php.net/manual/en/info.configuration.php#ini.max-execution-time
max_execution_time = 0
回答5:
the best way for you is use remot api . for example you can use import.io and get param from each page with json format . this is a way to get light page on each call for file_get_content or flie_get_html
but for this issu curl is beter than file_get_html
来源:https://stackoverflow.com/questions/7036767/how-can-i-get-infinite-maximum-execution-time-with-php