Academic

LARFT: Closing the Cognition-Action Gap for Length Instruction Following in Large Language Models

arXiv:2603.19255v1 Announce Type: cross Abstract: Despite the strong performance of Large Language Models (LLMs) on complex instruction-following tasks, precise control of output length remains a persistent challenge. Existing methods primarily attempt to enforce length constraints by externally imposing length signals or optimization objectives, while largely overlooking the underlying limitation: the model's intrinsic deficit in length cognition. To address this, we propose LARFT (Length-Aware Reinforcement Fine-Tuning), a training framework that aligns the model's length cognition with its action. Specifically, LARFT integrates length-oriented reinforcement learning with a hindsight length awareness. By transforming on-policy data into hindsight self-awareness tasks where the model learns to identify the actual length of its own generation, LARFT jointly optimizes the model's internal representation of length information and refines its policy to satisfy length constraints, thereby

arXiv:2603.19255v1 Announce Type: cross Abstract: Despite the strong performance of Large Language Models (LLMs) on complex instruction-following tasks, precise control of output length remains a persistent challenge. Existing methods primarily attempt to enforce length constraints by externally imposing length signals or optimization objectives, while largely overlooking the underlying limitation: the model's intrinsic deficit in length cognition. To address this, we propose LARFT (Length-Aware Reinforcement Fine-Tuning), a training framework that aligns the model's length cognition with its action. Specifically, LARFT integrates length-oriented reinforcement learning with a hindsight length awareness. By transforming on-policy data into hindsight self-awareness tasks where the model learns to identify the actual length of its own generation, LARFT jointly optimizes the model's internal representation of length information and refines its policy to satisfy length constraints, thereby achieving precise and reliable length instruction following. Extensive experiments across four base models demonstrate that LARFT outperforms existing baselines, achieving an average improvement of +20.92 points across three length instruction following benchmarks with only a marginal decline of -1.45 points on four general capability benchmarks.

Executive Summary

The article proposes LARFT, a novel training framework for Large Language Models (LLMs) to address the challenge of precise control of output length. LARFT integrates length-oriented reinforcement learning with hindsight length awareness to align the model's length cognition with its action. The framework transforms on-policy data into hindsight self-awareness tasks, enabling the model to identify the actual length of its own generation and refine its policy to satisfy length constraints. The authors demonstrate the effectiveness of LARFT through extensive experiments across four base models, achieving an average improvement of +20.92 points on three length instruction following benchmarks with minimal decline on general capability benchmarks. This breakthrough has significant implications for the development of more accurate and reliable LLMs.

Key Points

  • LARFT addresses the underlying limitation of LLMs in length cognition
  • The framework integrates length-oriented reinforcement learning with hindsight length awareness
  • LARFT achieves precise and reliable length instruction following

Merits

Strength of LARFT

LARFT's novel integration of length-oriented reinforcement learning with hindsight length awareness enables the model to develop a more accurate internal representation of length information, leading to improved length instruction following performance.

Improved Performance

The extensive experiments demonstrate LARFT's effectiveness in achieving an average improvement of +20.92 points on three length instruction following benchmarks with minimal decline on general capability benchmarks.

Robustness and Reliability

LARFT's ability to refine the model's policy to satisfy length constraints ensures precise and reliable length instruction following, making it a valuable contribution to the development of more accurate and reliable LLMs.

Demerits

Limited Generalizability

The experiments are conducted on a limited set of base models and benchmarks, which may limit the generalizability of LARFT's results to other LLM architectures and tasks.

Computational Expenses

The integration of length-oriented reinforcement learning with hindsight length awareness may incur significant computational expenses, which could be a limitation for large-scale LLM applications.

Expert Commentary

The article makes a significant contribution to the field of LLM research, addressing a long-standing challenge in the development of more accurate and reliable AI systems. The novel integration of length-oriented reinforcement learning with hindsight length awareness is a key strength of LARFT, enabling the model to develop a more accurate internal representation of length information. While the experiments demonstrate the effectiveness of LARFT, further research is needed to explore its generalizability to other LLM architectures and tasks. Additionally, the computational expenses associated with LARFT's implementation may be a limitation for large-scale LLM applications.

Recommendations

  • Further research should be conducted to explore the generalizability of LARFT to other LLM architectures and tasks.
  • Investigate methods to reduce the computational expenses associated with LARFT's implementation.

Sources

Original: arXiv - cs.AI