Academic

The Phasor Transformer: Resolving Attention Bottlenecks on the Unit Circle

arXiv:2603.17433v1 Announce Type: new Abstract: Transformer models have redefined sequence learning, yet dot-product self-attention introduces a quadratic token-mixing bottleneck for long-context time-series. We introduce the \textbf{Phasor Transformer} block, a phase-native alternative representing sequence states on the unit-circle manifold $S^1$. Each block combines lightweight trainable phase-shifts with parameter-free Discrete Fourier Transform (DFT) token coupling, achieving global $\mathcal{O}(N\log N)$ mixing without explicit attention maps. Stacking these blocks defines the \textbf{Large Phasor Model (LPM)}. We validate LPM on autoregressive time-series prediction over synthetic multi-frequency benchmarks. Operating with a highly compact parameter budget, LPM learns stable global dynamics and achieves competitive forecasting behavior compared to conventional self-attention baselines. Our results establish an explicit efficiency-performance frontier, demonstrating that large-m

D
Dibakar Sigdel
· · 1 min read · 3 views

arXiv:2603.17433v1 Announce Type: new Abstract: Transformer models have redefined sequence learning, yet dot-product self-attention introduces a quadratic token-mixing bottleneck for long-context time-series. We introduce the \textbf{Phasor Transformer} block, a phase-native alternative representing sequence states on the unit-circle manifold $S^1$. Each block combines lightweight trainable phase-shifts with parameter-free Discrete Fourier Transform (DFT) token coupling, achieving global $\mathcal{O}(N\log N)$ mixing without explicit attention maps. Stacking these blocks defines the \textbf{Large Phasor Model (LPM)}. We validate LPM on autoregressive time-series prediction over synthetic multi-frequency benchmarks. Operating with a highly compact parameter budget, LPM learns stable global dynamics and achieves competitive forecasting behavior compared to conventional self-attention baselines. Our results establish an explicit efficiency-performance frontier, demonstrating that large-model scaling for time-series can emerge from geometry-constrained phase computation with deterministic global coupling, offering a practical path toward scalable temporal modeling in oscillatory domains.

Executive Summary

This article introduces the Phasor Transformer, a phase-native alternative to conventional self-attention mechanisms in transformer models. By leveraging the unit-circle manifold and Discrete Fourier Transform, the Phasor Transformer achieves global O(N log N) mixing without explicit attention maps. The Large Phasor Model (LPM) is proposed, which stacks Phasor Transformer blocks to effectively handle long-context time-series. Validation on synthetic multi-frequency benchmarks demonstrates competitive forecasting performance with a highly compact parameter budget. The Phasor Transformer offers a practical path toward scalable temporal modeling in oscillatory domains, providing an explicit efficiency-performance frontier. The article contributes to the advancement of transformer models and has significant implications for time-series prediction and analysis.

Key Points

  • Introduction of the Phasor Transformer, a phase-native alternative to self-attention
  • Use of unit-circle manifold and Discrete Fourier Transform for global O(N log N) mixing
  • Proposal of the Large Phasor Model (LPM) for effective handling of long-context time-series

Merits

Strength

The Phasor Transformer achieves competitive forecasting performance with a highly compact parameter budget, making it a scalable and efficient solution for time-series prediction.

Demerits

Limitation

The article relies on synthetic multi-frequency benchmarks for validation, which may not reflect real-world time-series characteristics.

Expert Commentary

The Phasor Transformer is a promising alternative to conventional self-attention mechanisms, offering a scalable and efficient solution for time-series prediction. The article's contributions are significant, and the proposed LPM architecture shows great potential for real-world applications. However, further validation on real-world datasets and exploration of the Phasor Transformer's limitations are necessary to fully understand its capabilities and limitations. The article's efficiency-performance frontier is an important finding, and its implications for time-series analysis and prediction are substantial.

Recommendations

  • Further validation of the Phasor Transformer on real-world time-series datasets is necessary to fully understand its capabilities and limitations.
  • Exploration of the Phasor Transformer's application to various domains, such as finance, healthcare, and climate modeling, is recommended to fully realize its potential.

Sources