Beyond the Class Subspace: Teacher-Guided Training for Reliable Out-of-Distribution Detection in Single-Domain Models
arXiv:2603.11269v1 Announce Type: new Abstract: Out-of-distribution (OOD) detection methods perform well on multi-domain benchmarks, yet many practical systems are trained on single-domain data. We show that this regime induces a geometric failure mode, Domain-Sensitivity Collapse (DSC): supervised training compresses features into a low-rank class subspace and suppresses directions that carry domain-shift signal. We provide theory showing that, under DSC, distance- and logit-based OOD scores lose sensitivity to domain shift. We then introduce Teacher-Guided Training (TGT), which distills class-suppressed residual structure from a frozen multi-domain teacher (DINOv2) into the student during training. The teacher and auxiliary head are discarded after training, adding no inference overhead. Across eight single-domain benchmarks, TGT yields large far-OOD FPR@95 reductions for distance-based scorers: MDS improves by 11.61 pp, ViM by 10.78 pp, and kNN by 12.87 pp (ResNet-50 average), whil
arXiv:2603.11269v1 Announce Type: new Abstract: Out-of-distribution (OOD) detection methods perform well on multi-domain benchmarks, yet many practical systems are trained on single-domain data. We show that this regime induces a geometric failure mode, Domain-Sensitivity Collapse (DSC): supervised training compresses features into a low-rank class subspace and suppresses directions that carry domain-shift signal. We provide theory showing that, under DSC, distance- and logit-based OOD scores lose sensitivity to domain shift. We then introduce Teacher-Guided Training (TGT), which distills class-suppressed residual structure from a frozen multi-domain teacher (DINOv2) into the student during training. The teacher and auxiliary head are discarded after training, adding no inference overhead. Across eight single-domain benchmarks, TGT yields large far-OOD FPR@95 reductions for distance-based scorers: MDS improves by 11.61 pp, ViM by 10.78 pp, and kNN by 12.87 pp (ResNet-50 average), while maintaining or slightly improving in-domain OOD and classification accuracy.
Executive Summary
This article presents a novel approach to out-of-distribution (OOD) detection in single-domain models, a crucial problem in machine learning and artificial intelligence. The authors identify a geometric failure mode, Domain-Sensitivity Collapse (DSC), which occurs when supervised training compresses features into a low-rank class subspace, suppressing directions that carry domain-shift signal. To address this issue, the authors propose Teacher-Guided Training (TGT), a method that distills class-suppressed residual structure from a frozen multi-domain teacher into the student during training. The results show significant improvements in far-OOD FPR@95 reductions for distance-based scorers, while maintaining or slightly improving in-domain OOD and classification accuracy. The article provides a valuable contribution to the field of OOD detection and has important implications for the development of reliable AI systems.
Key Points
- ▸ Domain-Sensitivity Collapse (DSC) is a geometric failure mode that occurs in single-domain models when supervised training compresses features into a low-rank class subspace.
- ▸ Teacher-Guided Training (TGT) is a novel approach that distills class-suppressed residual structure from a frozen multi-domain teacher into the student during training.
- ▸ TGT yields large far-OOD FPR@95 reductions for distance-based scorers while maintaining or slightly improving in-domain OOD and classification accuracy.
Merits
Theoretical foundation
The article provides a solid theoretical foundation for understanding the geometric failure mode of Domain-Sensitivity Collapse (DSC) and the proposed solution of Teacher-Guided Training (TGT).
Experimental evaluation
The article presents extensive experimental evaluation on eight single-domain benchmarks, demonstrating the effectiveness of TGT in improving OOD detection accuracy.
Novel approach
TGT is a novel approach that distills class-suppressed residual structure from a frozen multi-domain teacher into the student during training, offering a new perspective on OOD detection.
Demerits
Limited generalizability
The article focuses on single-domain models and may not be directly applicable to multi-domain models or other types of machine learning tasks.
Dependence on pre-trained teacher
TGT relies on a pre-trained multi-domain teacher, which may introduce additional computational and storage requirements.
Expert Commentary
The article presents a significant contribution to the field of out-of-distribution detection, providing a novel approach to addressing the geometric failure mode of Domain-Sensitivity Collapse (DSC) in single-domain models. The extensive experimental evaluation demonstrates the effectiveness of Teacher-Guided Training (TGT) in improving OOD detection accuracy. However, the article's findings may not be directly applicable to multi-domain models or other types of machine learning tasks. Furthermore, TGT relies on a pre-trained multi-domain teacher, which may introduce additional computational and storage requirements. Overall, the article provides valuable insights and recommendations for the development of reliable AI systems.
Recommendations
- ✓ Future research should investigate the generalizability of TGT to multi-domain models and other types of machine learning tasks.
- ✓ The development of more efficient and scalable methods for pre-training multi-domain teachers is necessary to make TGT more practical for widespread adoption.