Transformers
14 mentions across all digests
Transformers are a neural network architecture using attention mechanisms that serve as the foundation for large language models, with active research into their positional encodings, geometric properties, and contextual representation dynamics.
Numerical Instability and Chaos: Quantifying the Unpredictability of Large Language Models
Floating-point rounding errors trigger chaotic avalanche effects in early Transformer layers, creating three distinct behavioral regimes that fundamentally undermine determinism and reliability for agentic workflows.
The Long Delay to Arithmetic Generalization: When Learned Representations Outrun Behavior
Transformers learn arithmetic structure early but bottleneck in decoders; numeral base choice drives generalization success, with task-aligned bases reaching 99.8% while binary fails completely.
Turbulence-like 5/3 spectral scaling in contextual representations of language as a complex system
Language models' contextual representations exhibit 5/3 power-law spectral scaling identical to turbulent fluids, suggesting deep structural parallels between transformer internals and complex physical systems.
Short Data, Long Context: Distilling Positional Knowledge in Transformers
Transformers can compress positional information to extend context windows—enabling long-context performance with less training data overhead.
On the Geometry of Positional Encodings in Transformers
Geometric analysis reveals the mathematical structure underlying transformer positional encodings, offering theoretical insights into this fundamental representation mechanism.