Academic

Beyond Logit Adjustment: A Residual Decomposition Framework for Long-Tailed Reranking

arXiv:2604.01506v1 Announce Type: new Abstract: Long-tailed classification, where a small number of frequent classes dominate many rare ones, remains challenging because models systematically favor frequent classes at inference time. Existing post-hoc methods such as logit adjustment address this by adding a fixed classwise offset to the base-model logits. However, the correction required to restore the relative ranking of two classes need not be constant across inputs, and a fixed offset cannot adapt to such variation. We study this problem through Bayes-optimal reranking on a base-model top-k shortlist. The gap between the optimal score and the base score, the residual correction, decomposes into a classwise component that is constant within each class, and a pairwise component that depends on the input and competing labels. When the residual is purely classwise, a fixed offset suffices to recover the Bayes-optimal ordering. We further show that when the same label pair induces inco

arXiv:2604.01506v1 Announce Type: new Abstract: Long-tailed classification, where a small number of frequent classes dominate many rare ones, remains challenging because models systematically favor frequent classes at inference time. Existing post-hoc methods such as logit adjustment address this by adding a fixed classwise offset to the base-model logits. However, the correction required to restore the relative ranking of two classes need not be constant across inputs, and a fixed offset cannot adapt to such variation. We study this problem through Bayes-optimal reranking on a base-model top-k shortlist. The gap between the optimal score and the base score, the residual correction, decomposes into a classwise component that is constant within each class, and a pairwise component that depends on the input and competing labels. When the residual is purely classwise, a fixed offset suffices to recover the Bayes-optimal ordering. We further show that when the same label pair induces incompatible ordering constraints across contexts, no fixed offset can achieve this recovery. This decomposition leads to testable predictions regarding when pairwise correction can improve performance and when cannot. We develop REPAIR (Reranking via Pairwise residual correction), a lightweight post-hoc reranker that combines a shrinkage-stabilized classwise term with a linear pairwise term driven by competition features on the shortlist. Experiments on five benchmarks spanning image classification, species recognition, scene recognition, and rare disease diagnosis confirm that the decomposition explains where pairwise correction helps and where classwise correction alone suffices.

Executive Summary

This article introduces a novel residual decomposition framework for long-tailed reranking, challenging the conventional reliance on fixed logit adjustment offsets. By decomposing the residual correction into a classwise component (constant per class) and a pairwise component (input-dependent), the authors demonstrate that while a fixed offset suffices when corrections are classwise, pairwise adjustments become necessary when contextual constraints conflict. This insight leads to the development of REPAIR, a lightweight post-hoc reranker integrating both classwise and pairwise terms. Experimental validation across diverse domains validates the decomposition’s explanatory power. The work advances understanding of when post-hoc reranking interventions are effective, offering nuanced guidance beyond generic offsets.

Key Points

  • Residual decomposition reveals dual components: constant classwise and input-dependent pairwise.
  • Fixed logit offsets are insufficient when pairwise constraints conflict across contexts.
  • REPAIR introduces a hybrid reranker combining shrinkage-stabilized classwise and linear pairwise terms.

Merits

Theoretical Clarity

The decomposition framework offers a rigorous, mathematically grounded explanation of reranking dynamics, moving beyond heuristic logit adjustments.

Practical Relevance

REPAIR provides a lightweight, implementable solution validated empirically across multiple high-stakes domains, enhancing applicability in real-world systems.

Demerits

Complexity of Implementation

While REPAIR is lightweight, integrating pairwise features may introduce logistical challenges in deployment, particularly in systems constrained by computational resources or data availability.

Scope Limitation

The study focuses on post-hoc reranking; it does not address upstream model training or architectural biases contributing to long-tailed disparities.

Expert Commentary

The article represents a significant methodological advancement in the field of post-hoc reranking for long-tailed classification. Historically, logit adjustment has served as a default corrective measure due to its simplicity, yet its one-size-fits-all nature has masked deeper structural issues in inference dynamics. The authors’ decomposition into classwise and pairwise components is both elegant and empirically grounded—demonstrating not only the mathematical validity of their claims but also their operational applicability. The development of REPAIR as a modular, combinable reranker exemplifies a best-practice design: modular, interpretable, and empirically validated. Importantly, the study’s experimental validation across five heterogeneous domains (image, species, scene, disease) lends substantial credibility to the generalizability of findings. This work bridges a critical gap between theoretical insight and practical intervention, and may influence future design of reranking pipelines in both academic and applied domains. The critique of the fixed offset paradigm is particularly timely, as AI systems increasingly rely on ensemble or top-k ranking outputs where contextual nuance matters.

Recommendations

  • Adopt REPAIR or similar pairwise-aware rerankers in systems where contextual label conflicts are empirically observed.
  • Revise evaluation protocols for reranking methods to include contextual constraint diagnostics as standard metrics.

Sources

Original: arXiv - cs.LG