Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I want to submit Runnable tasks into ForkJoinPool via a method:

forkJoinPool.submit(Runnable task)

Note, I use JDK 7.

Under the hood, they are transformed into ForkJoinTask objects. I know that ForkJoinPool is efficient when a task is split into smaller ones recursively.

Question:

Does work-stealing still work in the ForkJoinPool if there is no recursion?

Is it worth it in this case?

Update 1: Tasks are small and can be unbalanced. Even for strictly equal tasks, such things like context switching, thread scheduling, parking, pages misses etc. get in the way leading to the imbalance.

Update 2: Doug Lea wrote in the Concurrency JSR-166 Interest group, by giving a hint on this:

This also greatly improves throughput when all tasks are async and submitted to the pool rather than forked, which becomes a reasonable way to structure actor frameworks, as well as many plain services that you might otherwise use ThreadPoolExecutor for.

I presume, when it comes to reasonably small CPU-bound tasks, ForkJoinPool is the way to go, thanks to this optimization. The main point is that these tasks are already small and needn't a recursive decomposition. Work-stealing works, regardless whether it is a big or small task - tasks can be grabbed by another free worker from the Deque's tail of a busy worker.

Update 3: Scalability of ForkJoinPool - benchmarking by Akka team of ping-pong shows great results.

Despite this, to apply ForkJoinPool more efficiently requires performance tuning.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
721 views
Welcome To Ask or Share your Answers For Others

1 Answer

ForkJoinPool source code has a nice section called "Implementation Overview", read up for an ultimate truth. The explanation below is my understanding for JDK 8u40.

Since day one, ForkJoinPool had a work queue per worker thread (let's call them "worker queues"). The forked tasks are pushed into the local worker queue, ready to be popped by the worker again and be executed -- in other words, it looks like a stack from worker thread perspective. When a worker depletes its worker queue, it goes around and tries to steal the tasks from other worker queues. That is "work stealing".

Now, before (IIRC) JDK 7u12, ForkJoinPool had a single global submission queue. When worker threads ran out of local tasks, as well the tasks to steal, they got there and tried to see if external work is available. In this design, there is no advantage against a regular, say, ThreadPoolExecutor backed by ArrayBlockingQueue.

It changed significantly after then. After this submission queue was identified as the serious performance bottleneck, Doug Lea et al. striped the submission queues as well. In hindsight, that is an obvious idea: you can reuse most of the mechanics available for worker queues. You can even loosely distribute these submission queues per worker threads. Now, the external submission goes into one of the submission queues. Then, workers that have no work to munch on, can first look into the submission queue associated with a particular worker, and then wander around looking into the submission queues of others. One can call that "work stealing" too.

I have seen many workloads benefiting from this. This particular design advantage of ForkJoinPool even for plain non-recursive tasks was recognized a long ago. Many users at concurrency-interest@ asked for a simple work-stealing executor without all the ForkJoinPool arcanery. This is one of the reasons, why we have Executors.newWorkStealingPool() in JDK 8 onward -- currently delegating to ForkJoinPool, but open for providing a simpler implementation.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...