One attention head, expressed in three learnable numbers. Long-range mixing in O(n log n) instead of O(n2).
Tokens write into a shared medium, waves propagate through the field, heads couple during the process, and token positions read the result back out.
Instead of forming an attention matrix, Wave Field uses a causal oscillator kernel with three learnable scalars per head. Drag the controls and the kernel recomputes live.
Frequency controls pattern scale, damping controls attention range, and phase gives heads different responses. The same compact equation expresses local syntax and long document structure.
Self-attention cost grows with the square of sequence length. Field attention grows almost linearly, so the curves diverge hard exactly where long-context workloads live.