Language models
through physics
WF-AI
Field Attention · Fig. 01Wave Field Labs · 2026
k(t) = eαt cos(ωt + φ)
ωfrequencypattern scale
αdampingattention reach
φphaseper-head response

One attention head, expressed in three learnable numbers. Long-range mixing in O(n log n) instead of O(n2).

See how it works 125M research preview · 128K context
Explore
01 / Architecture

Not another attention layer. A field-based route.

Tokens write into a shared medium, waves propagate through the field, heads couple during the process, and token positions read the result back out.

01
Scatter
Tokens deposit value information onto a one-dimensional continuous field using bilinear interpolation.
02
Propagate
A causal damped-wave kernel convolves the field via FFT, mixing information across all positions in O(n log n).
03
Couple
Heads interact during attention, exchanging field state so they specialise instead of acting independently.
04
Gather
Each token position samples the propagated field back out, reading a context-mixed value at its location.
scatter.module
O(n log n)
02 / Kernel

Each head becomes a damped wave.

Instead of forming an attention matrix, Wave Field uses a causal oscillator kernel with three learnable scalars per head. Drag the controls and the kernel recomputes live.

ω frequency2.00
α damping0.45
φ phase0.00

Frequency controls pattern scale, damping controls attention range, and phase gives heads different responses. The same compact equation expresses local syntax and long document structure.

k(t) = e0.45t cos(2.00t + 0.00), t0
03 / Complexity

The wall standard attention hits at length.

Self-attention cost grows with the square of sequence length. Field attention grows almost linearly, so the curves diverge hard exactly where long-context workloads live.

O(n2) standardO(n log n) field
sequence length →