I\'m trying to write a multithreaded web crawler.
My main entry class has the following code:
ExecutorService exec = Executors.newFixedThreadPool(num
The question is a bit old, but I think i have found some simple, working solution:
Extend the ThreadPoolExecutor class like below. The new functionality is keeping the active task count (unfortunately, provided getActiveCount()
is unreliable). If taskCount.get() == 0
and there are no more queued tasks, it means that there is nothing to be done and executor shuts down. You have your exit criteria. Also, if you create your executor, but fail to submit any tasks, it won't block:
public class CrawlingThreadPoolExecutor extends ThreadPoolExecutor {
private final AtomicInteger taskCount = new AtomicInteger();
public CrawlingThreadPoolExecutor() {
super(8, 8, 0, TimeUnit.SECONDS, new LinkedBlockingQueue());
}
@Override
protected void beforeExecute(Thread t, Runnable r) {
super.beforeExecute(t, r);
taskCount.incrementAndGet();
}
@Override
protected void afterExecute(Runnable r, Throwable t) {
super.afterExecute(r, t);
taskCount.decrementAndGet();
if (getQueue().isEmpty() && taskCount.get() == 0) {
shutdown();
}
}
}
One more thing you have to do is implement your Runnable
in a way it keeps reference to Executor
you are using in order to be able to submit new tasks. Here is a mock:
public class MockFetcher implements Runnable {
private final String url;
private final Executor e;
public MockFetcher(final Executor e, final String url) {
this.e = e;
this.url = url;
}
@Override
public void run() {
final List newUrls = new ArrayList<>();
// Parse doc and build url list, and then:
for (final String newUrl : newUrls) {
e.execute(new MockFetcher(this.e, newUrl));
}
}
}