Academic

CSE-UOI at SemEval-2026 Task 6: A Two-Stage Heterogeneous Ensemble with Deliberative Complexity Gating for Political Evasion Detection

arXiv:2603.12453v1 Announce Type: new Abstract: This paper describes our system for SemEval-2026 Task 6, which classifies clarity of responses in political interviews into three categories: Clear Reply, Ambivalent, and Clear Non-Reply. We propose a heterogeneous dual large language model (LLM) ensemble via self-consistency (SC) and weighted voting, and a novel post-hoc correction mechanism, Deliberative Complexity Gating (DCG). This mechanism uses cross-model behavioral signals and exploits the finding that an LLM response-length proxy correlates strongly with sample ambiguity. To further examine mechanisms for improving ambiguity detection, we evaluated multi-agent debate as an alternative strategy for increasing deliberative capacity. Unlike DCG, which adaptively gates reasoning using cross-model behavioral signals, debate increases agent count without increasing model diversity. Our solution achieved a Macro-F1 score of 0.85 on the evaluation set, securing 3rd place.

arXiv:2603.12453v1 Announce Type: new Abstract: This paper describes our system for SemEval-2026 Task 6, which classifies clarity of responses in political interviews into three categories: Clear Reply, Ambivalent, and Clear Non-Reply. We propose a heterogeneous dual large language model (LLM) ensemble via self-consistency (SC) and weighted voting, and a novel post-hoc correction mechanism, Deliberative Complexity Gating (DCG). This mechanism uses cross-model behavioral signals and exploits the finding that an LLM response-length proxy correlates strongly with sample ambiguity. To further examine mechanisms for improving ambiguity detection, we evaluated multi-agent debate as an alternative strategy for increasing deliberative capacity. Unlike DCG, which adaptively gates reasoning using cross-model behavioral signals, debate increases agent count without increasing model diversity. Our solution achieved a Macro-F1 score of 0.85 on the evaluation set, securing 3rd place.

Executive Summary

This paper presents a novel two-stage heterogeneous ensemble system for detecting political evasion in interviews, which achieved a Macro-F1 score of 0.85 on the evaluation set. The proposed system combines a dual large language model (LLM) ensemble with a novel post-hoc correction mechanism, Deliberative Complexity Gating (DCG), and explores multi-agent debate as an alternative strategy. The system's performance indicates the potential of heterogeneous ensembles and deliberative complexity gating in improving ambiguity detection in political discourse. The findings suggest that adapting reasoning using cross-model behavioral signals can enhance the accuracy of ambiguity detection, and that increasing model diversity can lead to better results. The paper's methodology and results contribute to the development of more effective systems for detecting political evasion and improving the clarity of political discourse.

Key Points

  • Proposed a two-stage heterogeneous ensemble system for detecting political evasion
  • Introduced Deliberative Complexity Gating (DCG) as a post-hoc correction mechanism
  • Explored multi-agent debate as an alternative strategy for increasing deliberative capacity

Merits

Strength in ensemble methodology

The proposed heterogeneous ensemble system demonstrates the effectiveness of combining multiple large language models with different strengths and weaknesses, adapting to the complexity of the task and improving overall performance.

Novelty of Deliberative Complexity Gating

The introduction of Deliberative Complexity Gating (DCG) as a post-hoc correction mechanism provides a novel approach to adapting reasoning using cross-model behavioral signals, enhancing the accuracy of ambiguity detection.

Methodological contributions

The paper's methodology and results contribute to the development of more effective systems for detecting political evasion and improving the clarity of political discourse.

Demerits

Limitation in dataset scope

The system's performance may not generalize to other datasets or domains, limiting its applicability and requiring further testing and validation.

Lack of detailed interpretability

The paper could benefit from a more detailed analysis of the interpretability of the proposed system, providing insights into the decision-making process and the effectiveness of DCG.

Comparison to other methods

A more comprehensive comparison to other state-of-the-art methods for detecting political evasion would strengthen the paper's contributions and provide a clearer understanding of its relative performance.

Expert Commentary

The paper presents a novel and effective approach to detecting political evasion and improving the clarity of political discourse. The proposed system's performance and the introduction of Deliberative Complexity Gating (DCG) contribute to the development of more effective systems for ambiguity detection. However, the paper's limitations, such as the scope of the dataset and the lack of detailed interpretability, require further attention. The findings and contributions of the paper have significant implications for both practical applications and policy development.

Recommendations

  • Future research should focus on adapting the proposed system to other datasets and domains, exploring its applicability and generalizability.
  • A more detailed analysis of the interpretability of the proposed system, including the decision-making process and the effectiveness of DCG, would provide valuable insights into the system's performance and limitations.

Sources