On the "Induction Bias" in Sequence Models
arXiv:2602.18333v1 Announce Type: cross Abstract: Despite the remarkable practical success of transformer-based language models, recent work has raised concerns about their ability to perform state …
Quality follows upgrading
Academic
arXiv:2602.18333v1 Announce Type: cross Abstract: Despite the remarkable practical success of transformer-based language models, recent work has raised concerns about their ability to perform state …
arXiv:2602.18417v1 Announce Type: cross Abstract: This paper presents a direct framework for sequence models with hidden states on closed subgroups of U(d). We use a …
arXiv:2408.03099v2 Announce Type: replace Abstract: Large language models (LLMs) are increasingly used for topic modeling, outperforming classical topic models such as LDA. Commonly, pre-trained LLM …
arXiv:2602.17679v1 Announce Type: new Abstract: Bayesian optimization (BO) is a powerful method for optimizing black-box manufacturing processes, but its performance is often limited when dealing …
arXiv:2602.17680v1 Announce Type: new Abstract: Existing Protein Language Models (PLMs) often suffer from limited adaptability to multiple tasks and exhibit poor generalization across diverse biological …
arXiv:2602.17682v1 Announce Type: new Abstract: Consistency-based generative models like Shortcut and MeanFlow achieve impressive results via a target-aware design for solving the Probability Flow ODE …
arXiv:2602.17683v1 Announce Type: new Abstract: Accurate short-term forecasting of vegetation dynamics is a key enabler for data-driven decision support in precision agriculture. Normalized Difference Vegetation …
arXiv:2602.17685v1 Announce Type: new Abstract: This paper addresses the challenge of multi target active debris removal (ADR) in Low Earth Orbit (LEO) by introducing a …
arXiv:2602.17688v1 Announce Type: new Abstract: Diffusion language models offer a compelling alternative to autoregressive code generation, enabling global planning and iterative refinement of complex program …
arXiv:2602.17697v1 Announce Type: new Abstract: Large Language Models (LLMs) are being increasingly used across a wide range of tasks. However, their substantial computational demands raise …
arXiv:2602.17706v1 Announce Type: new Abstract: Modeling long-range dependencies in time series generation poses a fundamental trade-off between representational capacity and computational efficiency. Traditional temporal diffusion …
arXiv:2602.17743v1 Announce Type: new Abstract: Large language models adapt to new tasks through in-context learning (ICL) without parameter updates. Current theoretical explanations for this capability …