Academic

Beyond the Class Subspace: Teacher-Guided Training for Reliable Out-of-Distribution Detection in Single-Domain Models

arXiv:2603.11269v1 Announce Type: new Abstract: Out-of-distribution (OOD) detection methods perform well on multi-domain benchmarks, yet many practical systems are trained on single-domain data. We show that this regime induces a geometric failure mode, Domain-Sensitivity Collapse (DSC): supervised training compresses features into a low-rank class subspace and suppresses directions that carry domain-shift signal. We provide theory showing that, under DSC, distance- and logit-based OOD scores lose sensitivity to domain shift. We then introduce Teacher-Guided Training (TGT), which distills class-suppressed residual structure from a frozen multi-domain teacher (DINOv2) into the student during training. The teacher and auxiliary head are discarded after training, adding no inference overhead. Across eight single-domain benchmarks, TGT yields large far-OOD FPR@95 reductions for distance-based scorers: MDS improves by 11.61 pp, ViM by 10.78 pp, and kNN by 12.87 pp (ResNet-50 average), whil

H
Hong Yang, Devroop Kar, Qi Yu, Travis Desell, Alex Ororbia
· · 1 min read · 4 views

arXiv:2603.11269v1 Announce Type: new Abstract: Out-of-distribution (OOD) detection methods perform well on multi-domain benchmarks, yet many practical systems are trained on single-domain data. We show that this regime induces a geometric failure mode, Domain-Sensitivity Collapse (DSC): supervised training compresses features into a low-rank class subspace and suppresses directions that carry domain-shift signal. We provide theory showing that, under DSC, distance- and logit-based OOD scores lose sensitivity to domain shift. We then introduce Teacher-Guided Training (TGT), which distills class-suppressed residual structure from a frozen multi-domain teacher (DINOv2) into the student during training. The teacher and auxiliary head are discarded after training, adding no inference overhead. Across eight single-domain benchmarks, TGT yields large far-OOD FPR@95 reductions for distance-based scorers: MDS improves by 11.61 pp, ViM by 10.78 pp, and kNN by 12.87 pp (ResNet-50 average), while maintaining or slightly improving in-domain OOD and classification accuracy.

Executive Summary

This article presents a novel approach to out-of-distribution (OOD) detection in single-domain models, a crucial problem in machine learning and artificial intelligence. The authors identify a geometric failure mode, Domain-Sensitivity Collapse (DSC), which occurs when supervised training compresses features into a low-rank class subspace, suppressing directions that carry domain-shift signal. To address this issue, the authors propose Teacher-Guided Training (TGT), a method that distills class-suppressed residual structure from a frozen multi-domain teacher into the student during training. The results show significant improvements in far-OOD FPR@95 reductions for distance-based scorers, while maintaining or slightly improving in-domain OOD and classification accuracy. The article provides a valuable contribution to the field of OOD detection and has important implications for the development of reliable AI systems.

Key Points

  • Domain-Sensitivity Collapse (DSC) is a geometric failure mode that occurs in single-domain models when supervised training compresses features into a low-rank class subspace.
  • Teacher-Guided Training (TGT) is a novel approach that distills class-suppressed residual structure from a frozen multi-domain teacher into the student during training.
  • TGT yields large far-OOD FPR@95 reductions for distance-based scorers while maintaining or slightly improving in-domain OOD and classification accuracy.

Merits

Theoretical foundation

The article provides a solid theoretical foundation for understanding the geometric failure mode of Domain-Sensitivity Collapse (DSC) and the proposed solution of Teacher-Guided Training (TGT).

Experimental evaluation

The article presents extensive experimental evaluation on eight single-domain benchmarks, demonstrating the effectiveness of TGT in improving OOD detection accuracy.

Novel approach

TGT is a novel approach that distills class-suppressed residual structure from a frozen multi-domain teacher into the student during training, offering a new perspective on OOD detection.

Demerits

Limited generalizability

The article focuses on single-domain models and may not be directly applicable to multi-domain models or other types of machine learning tasks.

Dependence on pre-trained teacher

TGT relies on a pre-trained multi-domain teacher, which may introduce additional computational and storage requirements.

Expert Commentary

The article presents a significant contribution to the field of out-of-distribution detection, providing a novel approach to addressing the geometric failure mode of Domain-Sensitivity Collapse (DSC) in single-domain models. The extensive experimental evaluation demonstrates the effectiveness of Teacher-Guided Training (TGT) in improving OOD detection accuracy. However, the article's findings may not be directly applicable to multi-domain models or other types of machine learning tasks. Furthermore, TGT relies on a pre-trained multi-domain teacher, which may introduce additional computational and storage requirements. Overall, the article provides valuable insights and recommendations for the development of reliable AI systems.

Recommendations

  • Future research should investigate the generalizability of TGT to multi-domain models and other types of machine learning tasks.
  • The development of more efficient and scalable methods for pre-training multi-domain teachers is necessary to make TGT more practical for widespread adoption.

Sources