Academic

Latest First Most Viewed Alphabetical

All Conference (266) Law Review (314) Academic (4957) Think Tank (60) News (791) Journal (139) Technology & AI (4) Business & Strategy (1) Finance & Economics (2) Legal & Compliance (1) Innovation & Research (0) International Affairs (2) Cybersecurity (2) Healthcare & Biotech (2)

Academic · 1 min

OpenSage: Self-programming Agent Generation Engine

arXiv:2602.16891v1 Announce Type: new Abstract: Agent development kits (ADKs) provide effective platforms and tooling for constructing agents, and their designs are critical to the constructed …

Hongwei Li, Zhun Wang, Qinrun Dai, Yuzhou Nie, Jinjun Peng, Ruitong Liu, Jingyang Zhang, Kaijie Zhu, Jingxuan He, Lun Wang, Yangruibo Ding, Yueqi Chen, Wenbo Guo, Dawn Song

27 views Feb 22

Academic · 1 min

AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks

arXiv:2602.16901v1 Announce Type: new Abstract: LLM agents are increasingly deployed in long-horizon, complex environments to solve challenging problems, but this expansion exposes them to long-horizon …

Tanqiu Jiang, Yuhui Wang, Jiacheng Liang, Ting Wang

30 views Feb 22

Academic · 1 min

LLM-WikiRace: Benchmarking Long-term Planning and Reasoning over Real-World Knowledge Graphs

arXiv:2602.16902v1 Announce Type: new Abstract: We introduce LLM-Wikirace, a benchmark for evaluating planning, reasoning, and world knowledge in large language models (LLMs). In LLM-Wikirace, models …

Juliusz Ziomek, William Bankes, Lorenz Wolf, Shyam Sundhar Ramesh, Xiaohang Tang, Ilija Bogunovic

10 views Feb 22

Academic · 1 min

Narrow fine-tuning erodes safety alignment in vision-language agents

arXiv:2602.16931v1 Announce Type: new Abstract: Lifelong multimodal agents must continuously adapt to new tasks through post-training, but this creates fundamental tension between acquiring capabilities and …

Idhant Gulati, Shivam Raval

8 views Feb 22

Academic · 1 min

DeepContext: Stateful Real-Time Detection of Multi-Turn Adversarial Intent Drift in LLMs

arXiv:2602.16935v1 Announce Type: new Abstract: While Large Language Model (LLM) capabilities have scaled, safety guardrails remain largely stateless, treating multi-turn dialogues as a series of …

Justin Albrethsen, Yash Datta, Kunal Kumar, Sharath Rajasekar

27 views Feb 22

Academic · 1 min

SourceBench: Can AI Answers Reference Quality Web Sources?

arXiv:2602.16942v1 Announce Type: new Abstract: Large language models (LLMs) increasingly answer queries by citing web sources, but existing evaluations emphasize answer correctness rather than evidence …

Hexi Jin, Stephen Liu, Yuheng Li, Simran Malik, Yiying Zhang

7 views Feb 22

Academic · 1 min

Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents

arXiv:2602.16943v1 Announce Type: new Abstract: Large language models deployed as agents increasingly interact with external systems through tool calls--actions with real-world consequences that text outputs …

Arnold Cartagena, Ariane Teixeira

28 views Feb 22

Academic · 1 min

LLM4Cov: Execution-Aware Agentic Learning for High-coverage Testbench Generation

arXiv:2602.16953v1 Announce Type: new Abstract: Execution-aware LLM agents offer a promising paradigm for learning from tool feedback, but such feedback is often expensive and slow …

Hejia Zhang, Zhongming Yu, Chia-Tung Ho, Haoxing Ren, Brucek Khailany, Jishen Zhao

8 views Feb 22

Academic · 1 min

Automating Agent Hijacking via Structural Template Injection

arXiv:2602.16958v1 Announce Type: new Abstract: Agent hijacking, highlighted by OWASP as a critical threat to the Large Language Model (LLM) ecosystem, enables adversaries to manipulate …

Xinhao Deng, Jiaqing Wu, Miao Chen, Yue Xiao, Ke Xu, Qi Li

24 views Feb 22

Academic · 1 min

HQFS: Hybrid Quantum Classical Financial Security with VQC Forecasting, QUBO Annealing, and Audit-Ready Post-Quantum Signing

arXiv:2602.16976v1 Announce Type: new Abstract: Here's the corrected paragraph with all punctuation and formatting issues fixed: Financial risk systems usually follow a two-step routine: a …

Srikumar Nayak

13 views Feb 22

Academic · 1 min

Fundamental Limits of Black-Box Safety Evaluation: Information-Theoretic and Computational Barriers from Latent Context Conditioning

arXiv:2602.16984v1 Announce Type: new Abstract: Black-box safety evaluation of AI systems assumes model behavior on test distributions reliably predicts deployment performance. We formalize and challenge …

Vishal Srivastava

8 views Feb 22

Academic · 1 min

Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation

arXiv:2602.16990v1 Announce Type: new Abstract: Most recommendation benchmarks evaluate how well a model imitates user behavior. In financial advisory, however, observed actions can be noisy …

Yan Wang, Yi Han, Lingfei Qian, Yueru He, Xueqing Peng, Dongji Feng, Zhuohan Xie, Vincent Jim Zhang, Rosie Guo, Fengran Mo, Jimin Huang, Yankai Chen, Xue Liu, Jian-Yun Nie

20 views Feb 22

← Previous

384 385 386 387 388

Academic

OpenSage: Self-programming Agent Generation Engine

AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks

LLM-WikiRace: Benchmarking Long-term Planning and Reasoning over Real-World Knowledge Graphs

Narrow fine-tuning erodes safety alignment in vision-language agents

DeepContext: Stateful Real-Time Detection of Multi-Turn Adversarial Intent Drift in LLMs

SourceBench: Can AI Answers Reference Quality Web Sources?

Mind the GAP: Text Safety Does Not Transfer to Tool-Call Safety in LLM Agents

LLM4Cov: Execution-Aware Agentic Learning for High-coverage Testbench Generation

Automating Agent Hijacking via Structural Template Injection

HQFS: Hybrid Quantum Classical Financial Security with VQC Forecasting, QUBO Annealing, and Audit-Ready Post-Quantum Signing

Fundamental Limits of Black-Box Safety Evaluation: Information-Theoretic and Computational Barriers from Latent Context Conditioning

Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation

JCG, PC

HSOLLC Co., Ltd.