fn process<T: MyTrait>(x: T)) VS using dyn MyTrait for dynamic dispatch.
Table of contents
In Rust, static dispatch (via generics with trait bounds) and dynamic dispatch (via dyn Trait) offer distinct performance profiles, critical for systems like real-time data processors. Static dispatch leverages monomorphization for speed, while dynamic dispatch uses vtables for flexibility. Below, I compare the two with an example and outline when to choose each based on performance, flexibility, and maintainability.
Example: Event Processor
Consider a system processing events (e.g., sensor readings, network packets):
trait EventProcessor {
fn process(&mut self, event: u32) -> u32;
}
struct FastProcessor { total: u32 }
struct LogProcessor { count: u32 }
impl EventProcessor for FastProcessor {
fn process(&mut self, event: u32) -> u32 {
self.total += event;
self.total
}
}
impl EventProcessor for LogProcessor {
fn process(&mut self, event: u32) -> u32 {
self.count += 1;
self.count
}
}
Static Dispatch Version
fn process_static<T: EventProcessor>(processor: &mut T, events: &[u32]) -> u32 {
let mut result = 0;
for &event in events {
result = processor.process(event);
}
result
}
// Usage
fn main() {
let mut fast = FastProcessor { total: 0 };
let events = vec![1, 2, 3];
let total = process_static(&mut fast, &events); // 6
println!("{}", total);
}
Dynamic Dispatch Version
fn process_dynamic(processor: &mut dyn EventProcessor, events: &[u32]) -> u32 {
let mut result = 0;
for &event in events {
result = processor.process(event);
}
result
}
// Usage
fn main() {
let mut fast = FastProcessor { total: 0 };
let events = vec![1, 2, 3];
let total = process_dynamic(&mut fast, &events); // 6
let mut log = LogProcessor { count: 0 };
let count = process_dynamic(&mut log, &events); // 3
println!("{} {}", total, count);
}
Performance Trade-Offs
Static Dispatch
- Mechanism: The compiler monomorphizes
process_staticfor each type (e.g.,FastProcessor,LogProcessor), creating separate functions likeprocess_static_fastandprocess_static_log. - Speed: No runtime overhead—calls to
processare inlined, enabling optimizations (e.g., loop unrolling, constant folding). On x86_64, this might compile to a tightaddloop with no jumps. - Cost: Larger binary size (e.g., ~100 bytes per monomorphized function). For 10 processor types, that’s ~1KB extra in
.text. - Assembly Example:
; process_static<FastProcessor> xor eax, eax ; result = 0 loop: add eax, [rsi] ; total += event add rsi, 4 dec rcx jnz loop
Dynamic Dispatch
- Mechanism:
dyn EventProcessoruses a vtable—a pointer to the type’s method table—stored with the object (e.g.,Box<dyn EventProcessor>is 16 bytes: 8 for data, 8 for vtable). - Speed: Slower due to indirect calls through the vtable (1-2 cycles per call on x86_64) and no inlining across type boundaries. Cache misses on vtable access add latency.
- Cost: Smaller binary—one
process_dynamicfunction (e.g., 50 bytes) works for all types. Total size stays constant regardless of processor count. - Assembly Example:
; process_dynamic loop: mov rax, [rdi+8] ; Load vtable ptr call [rax] ; Indirect call to process add rsi, 4 dec rcx jnz loop - Quantified: For 1M events, static might take 1ms (pure arithmetic), while dynamic takes 1.2ms (vtable overhead + no fusion). A 20% difference matters in real-time.
Scenarios and Preferences
Choose Static Dispatch
- Scenario: Hot loops in a real-time data processor (e.g., audio filtering, packet routing) where every cycle counts.
- Why: Zero overhead, inlining, and optimization potential. In
process_static, the compiler can unroll or SIMD-ify the loop forf32events. - Trade-Off: Larger binary, but acceptable for a known, small set of processors (e.g., 2-5 types).
- Maintainability: Less flexible—adding a new processor requires recompilation.
Choose Dynamic Dispatch
- Scenario: Plugin system or runtime-configurable processors (e.g., users load
EventProcessorimplementations dynamically). - Why: Flexibility—
dyn EventProcessorallows a single function to handle any type without recompiling. Binary size stays manageable with many processors. - Trade-Off: Slower runtime, but acceptable if
processis complex (call overhead is a smaller fraction) or invocation is infrequent. - Maintainability: Easier to extend—new types just implement the trait.
Verification
- Benchmark:
Expect static to be 10-20% faster.use criterion::{black_box, Criterion}; fn bench(c: &mut Criterion) { let events = vec![1; 1000]; let mut fast = FastProcessor { total: 0 }; c.bench_function("static", |b| b.iter(|| process_static(black_box(&mut fast), black_box(&events)))); c.bench_function("dynamic", |b| b.iter(|| process_dynamic(black_box(&mut fast), black_box(&events)))); } - Size:
size target/release/appshows static bloating.textper type.
Conclusion
In a real-time data processor, prefer static dispatch (process_static) for hot paths, trading code size for speed and inlining. For flexibility (e.g., pluggable processors), use dyn EventProcessor, accepting vtable costs. Profile to ensure static’s gains justify its footprint, balancing performance with system design goals.