NeurIPS Datasets & Benchmarks Track: From Art to Science in AI Evaluations
December 5 2025 NeurIPS Datasets & Benchmarks Track: From Art to Science in AI Evaluations Communications Chairs 2025 2025 Conference This post provides an update …
Quality follows upgrading
All Articles
December 5 2025 NeurIPS Datasets & Benchmarks Track: From Art to Science in AI Evaluations Communications Chairs 2025 2025 Conference This post provides an update …
arXiv:2603.20470v1 Announce Type: new Abstract: The rapid growth of the text-to-image (T2I) community has fostered a thriving online ecosystem of expert models, which are variants …
arXiv:2603.21013v1 Announce Type: new Abstract: Despite recent advances in integrating Large Language Models (LLMs) into social robotics, two weaknesses persist. First, existing implementations on platforms …
March 23 2026 Refining the Review Cycle: NeurIPS 2026 Area Chair Pilot Communication Chairs 2026 2026 Conference As NeurIPS continues to grow, we recognize that …
arXiv:2603.21321v1 Announce Type: new Abstract: Designing high-performance system heuristics is a creative, iterative process requiring experts to form hypotheses and execute multi-step conceptual shifts. While …
arXiv:2603.20911v1 Announce Type: new Abstract: Large language models make agent-based simulation more behaviorally expressive, but they also sharpen a basic methodological tension: fluent, human-like output …
arXiv:2603.20396v1 Announce Type: new Abstract: Human mathematics (HM), the mathematics humans discover and value, is a vanishingly small subset of formal mathematics (FM), the totality …
arXiv:2603.20988v1 Announce Type: new Abstract: The cognitive sciences aim to understand intelligence by formalizing underlying operations as computational models. Traditionally, this follows a cycle of …
March 23 2026 Introducing the Evaluations & Datasets Track at NeurIPS 2026 Communication Chairs 2026 2026 Conference We are excited to announce that the Datasets …
arXiv:2603.20948v1 Announce Type: new Abstract: gUFO is a lightweight implementation of the Unified Foundational Ontology (UFO) suitable for Semantic Web OWL 2 DL applications. UFO …
arXiv:2603.20620v1 Announce Type: new Abstract: Can we trust the reasoning traces that large reasoning models (LRMs) produce? We investigate whether these traces faithfully reflect what …
arXiv:2603.20595v1 Announce Type: new Abstract: Multi-agent systems (MAS) are increasingly used in healthcare to support complex decision-making through collaboration among specialized agents. Because these systems …