Optimizing Go Performance: Shifting Allocations from Heap to Stack

Introduction

Performance is a constant focus for the Go team, and recent releases have targeted a major bottleneck: heap allocations. Every time a Go program requests memory from the heap, a significant amount of runtime code executes to fulfill that request. Additionally, heap allocations add pressure on the garbage collector, which—even with modern improvements like the Green Tea collector—still carries overhead. To address this, the Go team has been working on moving more allocations from the heap to the stack, where they are much cheaper and often completely free. Stack allocations also impose no burden on the garbage collector, are automatically reclaimed when the stack frame is destroyed, and promote cache-friendly reuse.

Optimizing Go Performance: Shifting Allocations from Heap to Stack — Source: blog.golang.org

The Problem with Heap Allocations

Heap allocations are expensive because they require the runtime to find a suitable block of memory, update internal bookkeeping, and eventually clean up via garbage collection. Even with incremental and concurrent collectors, the overhead can add up, especially in hot code paths. Each allocation also tends to produce garbage that must be tracked and freed, increasing overall CPU usage and latency.

Stack Allocations Are Cheaper

Stack allocations, by contrast, are almost trivial: they simply adjust the stack pointer. Because each goroutine has its own stack, there is no contention. Memory on the stack is freed automatically when the function returns—no garbage collection needed. This simplicity makes stack allocations much faster and more predictable.

Example: Allocating a Slice

Consider a function that reads tasks from a channel and processes them:

func process(c chan task) {
    var tasks []task
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

When this loop runs, the slice tasks starts with no backing array. Each time append is called and the slice is full, a new, larger backing array is allocated on the heap. The typical growth pattern doubles the size each time: first allocation size 1, then 2, 4, 8, and so on. Early iterations require many heap allocations and produce garbage as old arrays are abandoned. If the slice never grows large—common in many workloads—this startup overhead is especially wasteful.

In the past, these small backing arrays were always heap-allocated. However, the Go compiler has been enhanced to detect certain patterns where the final size of a slice is constant or predictable. In such cases, the compiler can allocate the entire backing array on the stack, avoiding heap allocations entirely.

Compiler Optimizations for Constant-sized Slices

Starting with Go 1.24 (and refined in later releases), the compiler can place the backing array of a slice on the stack if it can prove the slice never exceeds a certain size. For example, if you know you will always process at most 10 tasks, you could write:

const maxTasks = 10
func process(c chan task) {
    tasks := make([]task, 0, maxTasks)
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

With a fixed capacity, the compiler recognizes that the backing array is stack-allocatable. This eliminates all heap allocations for the slice, reducing GC pressure and improving performance. Even without an explicit capacity, the compiler now performs escape analysis that can often prove small slices stay within a bound and allocate them on the stack.

Broader Impact and Future Work

These optimizations are part of a broader effort to minimize heap usage in Go. Similar improvements are being applied to other data structures and allocation patterns. The Go team continues to explore ways to extend stack allocation to more scenarios, such as dynamically sized slices that grow within a limited range. Each improvement reduces the load on the garbage collector and makes Go programs faster and more predictable.

For developers, the key takeaway is that writing code with clear size bounds can help the compiler make better optimization decisions. Using make with a pre-allocated capacity not only avoids repeated growth but also enables stack allocation in many cases. As Go evolves, more patterns will benefit from these automatic stack-placement optimizations, allowing you to focus on logic while the runtime handles memory efficiently.

215111 Stack