Trying to build a distributed crawler with ZeroMQ
I just started to learn ZeroMQ and want to build a distributed webcrawler as an example while learing. My idea is to have a "server", written in PHP, which accepts a url where the crawling should start. Workers (C# cli) will have to crawl that url, extract links, and push them back into a stack on the server. The server keeps sending urls in the stack to workers. Perhaps a redis will keep track of all crawled urls, so we dont crawl sites multiple times and have the ability to extract statistics of the current process. I would like to have the server to distribute tasks evenly, be aware of new