Academic

HiCI: Hierarchical Construction-Integration for Long-Context Attention

arXiv:2603.20843v1 Announce Type: new Abstract: Long-context language modeling is commonly framed as a scalability challenge of token-level attention, yet local-to-global information structuring remains largely implicit in existing approaches. Drawing on cognitive theories of discourse comprehension, we propose HiCI (Hierarchical Construction--Integration), a hierarchical attention module that constructs segment-level representations, integrates them into a shared global context, and broadcasts both to condition segment-level attention. We validate HiCI through parameter-efficient adaptation of LLaMA-2 with only <5.5% additional parameters, extending context from 4K to 100K tokens (7B) and 64K tokens (13B). Across language modeling, retrieval, and instruction-following benchmarks, HiCI yields consistent improvements over strong baselines, including matching proprietary models on topic retrieval and surpassing GPT-3.5-Turbo-16K on code comprehension. These results demonstrate the effec

Xiangyu Zeng, Qi Xu, Yunke Wang, Chang Xu · March 24, 2026 · 1 min read · 1 views

#cs.CL #cs.AI #cs.LG

Executive Summary

This article proposes HiCI, a hierarchical attention module for long-context language modeling, drawing on cognitive theories of discourse comprehension. HiCI constructs segment-level representations, integrates them into a shared global context, and broadcasts both to condition segment-level attention. The authors validate HiCI through parameter-efficient adaptation of LLaMA-2, achieving consistent improvements over strong baselines on language modeling, retrieval, and instruction-following benchmarks. The results demonstrate the effectiveness of explicit hierarchical structuring as an inductive bias for long-context modeling. The authors' approach has the potential to revolutionize the field of natural language processing, enabling more accurate and efficient language models for various applications.

Key Points

▸ HiCI is a hierarchical attention module that constructs segment-level representations and integrates them into a shared global context.
▸ The approach achieves parameter-efficient adaptation of LLaMA-2, with less than 5.5% additional parameters.
▸ HiCI yields consistent improvements over strong baselines on language modeling, retrieval, and instruction-following benchmarks.

Merits

Strength

The proposed hierarchical attention module, HiCI, provides a novel and effective approach to long-context language modeling, drawing on cognitive theories of discourse comprehension.

Strength

The parameter-efficient adaptation of LLaMA-2 demonstrates the scalability of HiCI, enabling its application to large-scale language models.

Strength

The consistent improvements over strong baselines on various benchmarks showcase the effectiveness of HiCI in real-world applications.

Demerits

Limitation

The article does not provide an in-depth analysis of the computational complexity of HiCI, which may be a concern for large-scale applications.

Limitation

The authors rely on cognitive theories of discourse comprehension, which may not be universally applicable to all languages and cultural contexts.

Expert Commentary

The article's proposed hierarchical attention module, HiCI, is a significant contribution to the field of natural language processing. By drawing on cognitive theories of discourse comprehension, the authors provide a novel and effective approach to long-context language modeling. The parameter-efficient adaptation of LLaMA-2 demonstrates the scalability of HiCI, enabling its application to large-scale language models. The consistent improvements over strong baselines on various benchmarks showcase the effectiveness of HiCI in real-world applications. However, the article's reliance on cognitive theories of discourse comprehension may limit its applicability to all languages and cultural contexts. Furthermore, the computational complexity of HiCI remains an open question that requires further investigation.

Recommendations

✓ Future research should investigate the computational complexity of HiCI and explore its applicability to various languages and cultural contexts.
✓ The authors should provide a more detailed analysis of the cognitive theories of discourse comprehension that inform the development of HiCI, to enable a deeper understanding of its underlying mechanisms.

Sources

Original: arXiv - cs.CL

arXiv - cs.CL

HiCI: Hierarchical Construction-Integration for Long-Context Attention

AI Commentary

Executive Summary

Key Points

Merits

Strength

Strength

Strength

Demerits

Limitation

Limitation

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.