Go Concurrency Series: Deep Dive into Go Scheduler(II)
Understand how Go Scheduler optimizes CPU Utilization leveraging Network Poller, Timer Heaps, Work Stealing and Thread Parking/Unparking!
In my last post, we covered the components inside the Go Scheduler, and how a Go Scheduler can orchestrate the management of the Goroutines, such that we can get maximum CPU utilization. In this article, we will dive deep into how Go Scheduler fairness and tries to ensure that Goroutines are not starved!
Let’s discuss another component in the Go Runtime that helps the Go Scheduler out when managing fairness across Goroutines!
Network Poller
The Network Poller manages network I/O operations initiated by different goroutines. So, whenever a goroutine makes an IO-bound operation, instead of blocking the execution of the thread, the operation is handed over to the network poller which monitors the network I/O operation. Once the network I/O operation is ready (data is available to read, or the socket is ready for writing), the network poller notifies the Go runtime scheduler, which then reschedules the goroutine for execution.
So, the network poller is the reason Go doesn’t have to wait for network I/O blocking operations by blocking the threads and helps it avoid the need to create new threads to handle new goroutines. It can just hand over the blocking operations to the network poller and run new goroutines on the thread!
Timer Heaps
Whenever the Goroutines encounter some time-related activity, i.e. activity where they have to wait for a certain time or trigger something after a deadline(time.Sleep, time.After, time.NewTimer, time.NewTicker etc), Go Scheduler suspends the Goroutine and places it on the timer heap.
As the name suggests, Timer Heaps are heap data structures, which represent a min-heap, such that Go Scheduler can find the next expiring timer in O(log N)(since the timer is removed as well). The addition of a new timer involves "sifting up" the new timer in the heap to find its correct position, ensuring the earliest timer remains at the root. This can also be done in O(log N).
It’s important to note here that the timer heaps exist per P(Processor). So whenever a goroutine encounters some time-related activity, it is added to the timer heap of P.
How are timers polled from the Timer Heap?
Whenever a P is trying to find a runnable Goroutine to execute, it tries to check its timer heap and find if any timers have expired and if yes, it finds the Goroutine corresponding to the timer and runs it!
Now that we know about Network Poller and Timer Heap, let’s transition to understanding some concepts around what’s going on inside the Go Scheduler.
Thread Parking/Unparking
Go Scheduler needs to strike a balance between running threads to utilize the parallelism of the system or parking unused threads to save resources(so that they can be utilized later).
Challenge
If you think about it, this is a hard challenge! Go doesn’t know the work that will come in the future, hence it’s very difficult to determine how many threads to park.
At the same time, let’s assume that a goroutine running on P1, created a new goroutine. Now if the Local Run Queue of P1 is full and Go Scheduler finds a spare P, and there’s no active thread associated with P, it will unpark a thread and hand the thread and goroutine to this P. This has two challenges -
The locality of computation is destroyed. Check this.
P might either complete its work soon/the work gets blocked and P might become idle immediately after the handoff. P would then either have to park its thread (since it has no more work) or look for more work. If it parks, then it could lead to thrashing, where there is a lot of time spent just transitioning between active and idle states(unparked and parked states).
Solution
To unpark any additional thread, Go Scheduler ensures that it has the following -
An idle P. An idle P does not have any work in its local run queue and isn’t able to find work in the GRQ, Netpoller and TimerHeaps as well.
There are no spinning worker threads. A worker thread is considered to be spinning if it’s constantly polling the GRQ, Netpoller and TimerHeaps for work and isn’t able to find any.
If a spinning thread can find work, it’s able to take itself out of the spinning state and proceed to execution. Before starting execution, the spinning thread checks if it is the last spinning thread, and if yes, it should unpark a new thread, which will keep spinning.
If the spinning thread cannot find work, it will take itself out of the spinning state and park itself(not if it’s the last one spinning). This ensures that at least one thread is always available to handle any new work(and not all threads are just spinning, i.e. max CPU utilization).
Go ensures that the maximum number of spinning M is half the number of busy P’s. This is to ensure that there aren’t too many spinning M's relative to the amount of actual parallel work available, to keep the CPU usage in check when there is no real work.
Work Stealing
We’ve discussed how a Processor P can look for runnable Goroutines in either the LRQ, GRQ, Netpoller or the Timer Heap. However, if it still isn’t able to find any work, it can always steal work from other processors.
Stealing is not always bad! - Go Scheduler
If P has a spinning thread(M), then it can attempt to steal a runnable goroutine or timer from any P.
Work stealing helps maximize CPU utilization by ensuring that idle P's do not remain inactive when there is work available. Idle P's can steal work from other P's, thereby keeping all CPU cores busy and utilizing the available computing resources effectively.
Handoff
We’ve covered how the Go Scheduler handles cases when Goroutine makes network I/O calls or timer-related calls. But what happens when the goroutine makes a blocking syscall?
When blocking calls are made by Goroutines, they end up blocking the kernel thread as well, which means no other goroutine can run on that P, which ends up starving the local run queue, while P does no work! To handle these cases, Go disassociates the P with the blocked M(thread). The processor(P) is then assigned an idle thread or a new thread is created and assigned to P.
However, handoffs can become expensive, especially if a new thread is created each time during a handoff. Also, the blocking syscall might be a very short one and in such cases, handoffs are not optimal.
So, the Go scheduler is smart about the handoffs. It will do an immediate handoff if it knows that the syscall will block for a long time. But for other cases, it uses Sysmon.
Sysmon
Sysmon is a background task responsible for monitoring and managing various system-level aspects of the Go runtime.
Since sysmon has multiple responsibilities and this article is becoming a little too big, I’ll cover sysmon in a separate article!
Fairness
A fair scheduler can ensure that each goroutine has the chance to be executed. The scheduler should try to ensure that it’s giving a nearly equal amount of time to all goroutines! The early algorithms of the scheduler were cooperative where the goroutines used to voluntarily relinquish control, which led to fairness issues if you had long-running goroutines.
Preemptive Scheduling
Go's scheduler uses a time slice, often referred to as a time quantum or time slot, to determine when to preempt a running goroutine. The scheduler's use of a time slice is not a strict enforcement but rather a soft limit. If a goroutine has been running for longer than its allocated time slice, it becomes eligible for preemption. However, the scheduler may allow a goroutine to run slightly longer to avoid the performance overhead of frequent preemptions.
Since Go 1.14, the scheduler has been making use of preemptive scheduling, allowing more fair scheduling of goroutines.
This is it for this edition of the Go Concurrency Series. The upcoming posts will start being a little more hands-on now that you have an understanding of the internals!