Optimizing Go Performance: Stack Allocation for Slices

The Cost of Heap Allocations

In recent releases, the Go team has focused on reducing a major source of performance bottlenecks: heap allocations. Whenever a Go program requests memory from the heap, a significant amount of code runs to satisfy that allocation. This not only slows down the immediate operation but also increases the workload on the garbage collector (GC). Even with advanced techniques like the Green Tea GC, the overhead remains substantial.

Optimizing Go Performance: Stack Allocation for Slices — Source: blog.golang.org

Stack allocations, by contrast, are far cheaper—sometimes even free. They place no burden on the GC because stack memory is automatically reclaimed when the function returns. Additionally, stack allocations enable efficient reuse, which is highly cache-friendly and improves overall program speed.

How Slice Growth Creates Heap Allocations

Consider a common pattern: building a slice by appending items from a channel.

func process(c chan task) {
    var tasks []task
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

At first glance, this code looks simple, but let's examine what happens at runtime. On the first loop iteration, tasks has no backing array, so append must allocate one. Because the eventual size is unknown, the runtime starts small—allocating an array of size 1.

On the second iteration, that backing array is full, so append allocates a new array of size 2 and copies the old element over. The old array (size 1) becomes garbage.

The process repeats: iteration 3 allocates size 4 (copying two elements), iteration 4 fits into the existing array of size 4 (which now holds three items), iteration 5 allocates size 8, and so on. The slice grows by doubling its capacity each time it runs out of space.

The Startup Phase Problem

This exponential growth works well for large slices, but the startup phase—when the slice is small—is surprisingly wasteful. In our example, the first three iterations each trigger a heap allocation and produce garbage. If the channel only delivers a handful of tasks, the program may spend more time allocating and collecting than actually processing.

Even for longer streams, the early allocations still happen. And in performance-critical code paths, these repeated small allocations can add up, creating pressure on the GC and slowing down the entire program.

Stack Allocation for Constant-Sized Slices

One way to avoid this overhead is to pre-allocate the slice with a known capacity. If you know—or can estimate—the maximum number of items, you can use make with a capacity argument:

func process(c chan task) {
    const maxTasks = 100
    tasks := make([]task, 0, maxTasks)
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

When the backing array is allocated via make with a constant size, the Go compiler can often place that array on the stack instead of the heap. Stack allocation eliminates the allocator call entirely for the backing array, and the array is reclaimed automatically when the function returns—no GC work needed.

When to Use Pre-allocation

This optimization works best when:

The slice size is bounded and known at compile time (e.g., a constant).
You are willing to trade a slightly larger stack frame for faster allocation and reduced GC pressure.
The slice lives only within the function scope and isn't returned (otherwise it escapes to the heap).

For cases where the exact size isn't known but an upper bound exists, you can still benefit from pre-allocating with that bound. Even if the bound is an overestimate, the stack allocation is often cheaper than repeated small heap allocations.

Other Stack Allocation Opportunities

The compiler applies similar optimizations to other patterns. For example, small fixed-size arrays, structs used only within a function, and closures that don't escape can all be stack-allocated. Pay attention to escape analysis warnings in your code reviews—moving allocations from heap to stack is one of the easiest performance wins.

In Go 1.23 and later, the team has also improved the compiler's ability to detect when a slice's backing array can be stack-allocated even without a constant capacity, as long as the compiler can prove the slice doesn't escape and the size is bounded.

Conclusion

Heap allocations are expensive, especially in the startup phase of slice growth. By pre-allocating slices with a suitable capacity—often a constant—you can convert many of those heap allocations into stack allocations, improving performance and reducing GC load. This simple change can yield significant speedups in hot code paths. Always profile your application and look for allocation-heavy patterns; stack-allocating constant-sized slices is a practical and effective optimization.