Introduction
One of Go’s biggest selling points is its high performance across a variety of workloads from APIs to databases to devops tooling. However, writing consistently optimized code requires awareness of performance best practices as well as an understanding of Go’s internals.
In this comprehensive guide, we’ll cover:
- Benchmarking bottlenecks
- Optimizing expensive loops
- Memory allocation tricks
- Reducing garbage collection pauses
- Improving goroutine performance
- Avoiding pitfalls like dynamic dispatch
- Leveraging compiler optimizations
- Performance from code architecture
- Comparing to C and assembly language
We’ll use plenty of examples to internalize Go performance optimization techniques for writing high-performance suitable for production systems programming. Let’s get started!
Let’s Explore Go performance optimization techniques
Benchmarking Code
The first step is benchmarking to identify slow code. Use Go’s built-in benchmarking support:
func BenchmarkDecode(b *testing.B) {
data := loadJSONTestData()
b.ResetTimer()
for i := 0; i < b.N; i++ {
decodeJSON(data)
}
}
This runs the decodeJSON
code repeatedly while timing the duration.
Common benchmarks to try:
-marshal/unmarshal formats like JSON -hashing and crypto algorithms -database query functions -encoding formats like gob -goroutines communication
The fastest benchmarks run for just nanoseconds per operation showing Go’s raw efficiency.
Optimizing Expensive Loops
Most programs spend majority of time inside loops whether processing data or compute algorithms. Optimize them diligently:
Preallocate memory
Preallocate slices and maps to expected capacity to avoid expensive reallocations:
values := make([]int, 1000000) // 1M slots allocated upfront
Length vs Capacity
When appending to slices, use capacity instead of length:
for i := 0; i < b.N; i++ {
s = s[:cap(s)] // Reset length not capacity
s = append(s, struct{}{})
}
Filter early
Break out of loops early instead of unnecessary iterations:
for _, item := range items {
if !condition(item) {
continue
}
process(item)
}
Unsafe
Use unsafe
package to avoid bounds checking for performance critical sections:
import "unsafe"
...
arrayHeader := (*[1<<16]byte)(unsafe.Pointer(bigArray))
This reduces unnecessary checks in tight loops.
Reducing Memory Allocations
Frequent small object allocations kill performance in garbage collected environments. Eliminate these:
Reuse objects
Reuse buffers, objects across iterations instead of repeatedly allocating:
buf := make([]byte, 2048) // Reuse buffer
for {
buf = buf[:2048] // Reset instead of realloc
n, err := conn.Read(buf)
...
}
Object pools
Maintain custom object pools to reuse objects like network connections or thread-safe resources.
Avoid strings
Intern strings into global variables instead of per-request allocations:
const ApiKey = "128ds766f8v32dsv8d7v6f2m23" // Interned API key
Carefully reusing objects minimizes pressure on the Go garbage collector.
Reducing Garbage Collection Pauses
Go’s garbage collector periodically pauses threads while cleaning up objects no longer in use. Manage GC pauses:
Reduce allocations
Fewer cumulatively allocated objects between GC cycles results in shorter pause times.
GOGC environment variable
Tunes the garbage collector by setting percentage of heap growth before next GC cycle:
GOGC=800 go run main.go // Higher = less frequent GC
Disable GC
Use runtime.GC() and
debug.SetGCPercent() to fully disable garbage collection for ultra-low pause times. Requires meticulously preallocating memory.
Parallelize heavily
Leverage concurrency via goroutines to minimize individual thread pause impacts.
Carefully tuning garbage collection overhead is key to reducing jitter.
Improving Goroutine Performance
Goroutines form Go’s concurrency foundation. Optimize them:
Grow strategically
Start with goroutine pools matching GOMAXPROCS before growing more.
Pool goroutines
Pool reused goroutines instead of repeatedly spawning.
Avoid nesting
Deep goroutine communication relies on locks degrading parallelism.
Lease shared resources
Lease thread-unsafe resources from a manager goroutine instead of communicating directly.
Goroutine usage dominates runtime overheads – master them to build highly concurrent Go programs.
Avoiding Performance Pitfalls
Some common ways performance regresses accidentally:
Interface dynamic dispatch
Calling methods via interface variables incurs an indirect lookup cost.
var w io.Writer = os.Stdout
w.Write([]byte("hello")) // Slow dispatch
fmt.Fprintln(os.Stdout, "hello") // Fast native call
Excessive regex
PCRE regular expressions execute outside native code causing overhead. Use judiciously.
Defer frequently
Deferred functions allocate objects under the hood. Avoid calling excessively per low-level functions.
Init loops
Avoid complex initialization logic using results within hot loops:
// Slow
for i := range items {
parsed, err := parse(items[i])
if err != nil {
log.Println(err)
continue
}
process(parsed)
}
// Fast
parsedItems := make([]parsedItem, len(items))
for i := range items {
parsedItems[i] = parse(items[i])
}
for i := range parsedItems {
process(parsedItems[i])
}
Awareness of intrinsic overheads in dynamic languages helps tune performance.
Leveraging Compiler Optimizations
Go’s compiler automatically applies optimizations like inlining, escape analysis to stack allocate objects, removal of unused code, and more.
You can inspect optimizations with verbose builds:
go build -gcflags="-m" to see optimizations
Some useful directives:
noescape
– Disables stack allocation escape analysislivelog
– Logs stack frame liveness verbose builds
This helps validate assumptions during manual optimization.
Architecting High Performance Code
Beyond idiomatic changes, program architecture and algorithms matter:
- Parallelize – Distribute independent I/O, computation concurrently
- Partition/batch – Localize related work to utilize caches
- Precompute – Trade space for faster lookups
- Approximate – Relax precision requirements if possible
- Stream/pipeline – Chain producers to consumers
- Organize – Physical layout to optimize memory access
Well-structured programs make efficient use of machine resources.
Comparing to C and Assembly
Given Go’s high performance goals, comparisons against C and assembly illuminate overhead delta:
C program 5 seconds (compiled natively)
Go equivalent 6 seconds (comparable to optimized C code)
Ruby equivalent 180 seconds (interpreted dynamic language)
So while Go lags behind pure native code in some cases, it matches and even exceeds C performance in many workloads with automatic memory management.
For maximal throughput, Go lets you embed assembly:
// Declare function in assembly
TEXT ·WriteSlice(SB), $0
MOVQ a+0(FP), SI // Base pointer
MOVQ b+8(FP), DX // Length
MOVB c+16(FP), BX // Value
MOVQ BP, RBX // Callee-saved on x86
// Custom assembly
ADDQ BX, SI
INCQ DX
MOVB BL, (SI)
SUBQ BP, RBX
RET
AllowingEscape to SIMD instructions, vector units, and more for niche cases.
Summary
We’ve explored a multitude of Go performance optimization techniques:
- Built-in benchmarking for profiling
- Optimizing expensive loops and data access
- Reducing memory allocations significantly
- Tuning garbage collection overheads
- Improving parallel performance with goroutines
- Avoiding dynamic dispatch and regex which incur overhead
- Leveraging compiler optimizations through directives
- Comparing assembly language to understand overhead
Together these enable unparalleled performance across a variety of workloads from web APIs and databases to systems tooling.
While trading raw speed for developer productivity is often wise, Go manages to provide both. By following these guidelines, your code will efficiently leverage machine resources to maximize throughput while retaining simplicity.
I hope this guide has provided a holistic overview into writing consistently fast Go. Let me know if you have any other specific areas you’d like me to discuss!
Frequently Asked Questions
Q: Does Go perform as fast as C or C++?
A: Go achieves C-like performance for network-bound workloads. But C/C++ still surpass for raw computational speed in certain workloads.
Q: When should I focus on optimizing Go code?
A: First develop readable, maintainable idiomatic Go code. Only then benchmark and optimize sections causing performance bottlenecks per program requirements.
Q: What are some examples of optimizing expensive loops in Go?
A: Pre-allocating slices and maps, using slice capacity over length, filtering early returns, avoiding bounds checks with unsafe
, parallelizing goroutines.
Q: How does Go’s garbage collector affect performance?
A: Go GC pauses programs to recycle unused memory which hurts latency-sensitive apps. Tuning via GOGC, reducing allocations, and disabling GC help.
Q: Should I use for loops or goroutines to parallelize Go code?
A: Profile both approaches. Goroutines involve scheduling overheads but enable asynchronous concurrent work. Stateless for loops allow lock-free parallelism.
Q: When would dropping down to assembly language help?
A: In niche cases like CPU-bound processing, bit manipulation routines etc. where every CPU cycle matters. But focus on algorithm efficiency first.
Q: What are some key things to avoid in Go performance-wise?
A: Unnecessary allocations, empty interface calls costing dynamic dispatch, init code affecting hot loops, overusing defer, under-utilizing the compiler.