Academic

Academic · 1 min

Plato's Cave: A Human-Centered Research Verification System

arXiv:2603.23526v1 Announce Type: new Abstract: The growing publication rate of research papers has created an urgent need for better ways to fact-check information, assess writing …

Matheus Kunzler Maldaner, Raul Valle, Junsung Kim, Tonuka Sultan, Pranav Bhargava, Matthew Maloni, John Courtney, Hoang Nguyen, Aamogh Sawant, Kristian O'Connor, Stephen Wormald, Damon L. Woodard

51 views Mar 26

Academic · 1 min

Prompt Compression in Production Task Orchestration: A Pre-Registered Randomized Trial

arXiv:2603.23525v1 Announce Type: new Abstract: The economics of prompt compression depend not only on reducing input tokens but on how compression changes output length, which …

Warren Johnson, Charles Lee

24 views Mar 26

Academic · 1 min

Navigating the Concept Space of Language Models

arXiv:2603.23524v1 Announce Type: new Abstract: Sparse autoencoders (SAEs) trained on large language model activations output thousands of features that enable mapping to human-interpretable concepts. The …

Wilson E. Marc\'ilio-Jr, Danilo M. Eler

57 views Mar 26

Academic · 1 min

Do 3D Large Language Models Really Understand 3D Spatial Relationships?

arXiv:2603.23523v1 Announce Type: new Abstract: Recent 3D Large-Language Models (3D-LLMs) claim to understand 3D worlds, especially spatial relationships among objects. Yet, we find that simply …

Xianzheng Ma, Tao Sun, Shuai Chen, Yash Bhalgat, Jindong Gu, Angel X Chang, Iro Armeni, Iro Laina, Songyou Peng, Victor Adrian Prisacariu

21 views Mar 26

Academic · 1 min

Qworld: Question-Specific Evaluation Criteria for LLMs

arXiv:2603.23522v1 Announce Type: new Abstract: Evaluating large language models (LLMs) on open-ended questions is difficult because response quality depends on the question's context. Binary scores …

Shanghua Gao, Yuchang Su, Pengwei Sui, Curtis Ginder, Marinka Zitnik

23 views Mar 26

Academic · 1 min

Chitrakshara: A Large Multilingual Multimodal Dataset for Indian languages

arXiv:2603.23521v1 Announce Type: new Abstract: Multimodal research has predominantly focused on single-image reasoning, with limited exploration of multi-image scenarios. Recent models have sought to enhance …

Shaharukh Khan, Ali Faraz, Abhinav Ravi, Mohd Nauman, Mohd Sarfraz, Akshat Patidar, Raja Kolla, Chandra Khatri, Shubham Agarwal

19 views Mar 26

Academic · 1 min

From Physician Expertise to Clinical Agents: Preserving, Standardizing, and Scaling Physicians' Medical Expertise with Lightweight …

arXiv:2603.23520v1 Announce Type: new Abstract: Medicine is an empirical discipline refined through long-term observation and the messy, high-variance reality of clinical practice. Physicians build diagnostic …

Chanyong Luo, Jirui Dai, Zhendong Wang, Kui Chen, Jiaxi Yang, Bingjie Lu, Jing Wang, Jiaxin Hao, Bing Li, Ruiyang He, Yiyu Qiao, Chenkai Zhang, Kaiyu Wang, Zhi Liu, Zeyu Zheng, Yan Li, Xiaohong Gu

16 views Mar 26

Academic · 1 min

MedMT-Bench: Can LLMs Memorize and Understand Long Multi-Turn Conversations in Medical Scenarios?

arXiv:2603.23519v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities across various specialist domains and have been integrated into high-stakes areas such …

Lin Yang, Yuancheng Yang, Xu Wang, Changkun Liu, Haihua Yang

18 views Mar 26

Academic · 1 min

Cluster-R1: Large Reasoning Models Are Instruction-following Clustering Agents

arXiv:2603.23518v1 Announce Type: new Abstract: General-purpose embedding models excel at recognizing semantic similarities but fail to capture the characteristics of texts specified by user instructions. …

Peijun Qing, Puneet Mathur, Nedim Lipka, Varun Manjunatha, Ryan Rossi, Franck Dernoncourt, Saeed Hassanpour, Soroush Vosoughi

24 views Mar 26

Academic · 1 min

MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

arXiv:2603.23516v1 Announce Type: new Abstract: Long-term memory is a cornerstone of human intelligence. Enabling AI to process lifetime-scale information remains a long-standing pursuit in the …

Yu Chen, Runkai Chen, Sheng Yi, Xinda Zhao, Xiaohong Li, Jianjin Zhang, Jun Sun, Chuanrui Hu, Yunyun Han, Lidong Bing, Yafeng Deng, Tianqiao Chen

27 views Mar 26

Academic · 1 min

Training a Large Language Model for Medical Coding Using Privacy-Preserving Synthetic Clinical Data

arXiv:2603.23515v1 Announce Type: new Abstract: Improving the accuracy and reliability of medical coding reduces clinician burnout and supports revenue cycle processes, freeing providers to focus …

John Cook, Michael Wyatt, Peng Wei, Iris Chin, Santosh Gupta, Van Zyl Van Vuuren, Richie Siburian, Amanda Spicer, Kristen Viviano, Alda Cami, Raunaq Malhotra, Zhewei Yao, Jeff Rasley, Gaurav Kaushik

36 views Mar 26

Academic · 1 min

DepthCharge: A Domain-Agnostic Framework for Measuring Depth-Dependent Knowledge in Large Language Models

arXiv:2603.23514v1 Announce Type: new Abstract: Large Language Models appear competent when answering general questions but often fail when pushed into domain-specific details. No existing methodology …

Alexander Sheppert

28 views Mar 26

Plato's Cave: A Human-Centered Research Verification System

Prompt Compression in Production Task Orchestration: A Pre-Registered Randomized Trial

Navigating the Concept Space of Language Models

Do 3D Large Language Models Really Understand 3D Spatial Relationships?

Qworld: Question-Specific Evaluation Criteria for LLMs

Chitrakshara: A Large Multilingual Multimodal Dataset for Indian languages

From Physician Expertise to Clinical Agents: Preserving, Standardizing, and Scaling Physicians' Medical Expertise with Lightweight …

MedMT-Bench: Can LLMs Memorize and Understand Long Multi-Turn Conversations in Medical Scenarios?

Cluster-R1: Large Reasoning Models Are Instruction-following Clustering Agents

MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

Training a Large Language Model for Medical Coding Using Privacy-Preserving Synthetic Clinical Data

DepthCharge: A Domain-Agnostic Framework for Measuring Depth-Dependent Knowledge in Large Language Models

JCG, PC

HSOLLC Co., Ltd.