All Articles

Articles

Academic · 1 min

A Rubric-Supervised Critic from Sparse Real-World Outcomes

arXiv:2603.03800v1 Announce Type: new Abstract: Academic benchmarks for coding agents tend to reward autonomous task completion, measured by verifiable rewards such as unit-test success. In …

Xingyao Wang, Valerie Chen, Heng Ji, Graham Neubig
12 views
Academic · 1 min

Phi-4-reasoning-vision-15B Technical Report

arXiv:2603.03975v1 Announce Type: new Abstract: We present Phi-4-reasoning-vision-15B, a compact open-weight multimodal reasoning model, and share the motivations, design choices, experiments, and learnings that informed …

Jyoti Aneja, Michael Harrison, Neel Joshi, Tyler LaBonte, John Langford, Eduardo Salinas
26 views