Academic

Conflict-Free Policy Languages for Probabilistic ML Predicates: A Framework and Case Study with the Semantic Router DSL

arXiv:2603.18174v1 Announce Type: new Abstract: Conflict detection in policy languages is a solved problem -- as long as every rule condition is a crisp Boolean predicate. BDDs, SMT solvers, and NetKAT all exploit that assumption. But a growing class of routing and access-control systems base their decisions on probabilistic ML signals: embedding similarities, domain classifiers, complexity estimators. Two such signals, declared over categories the author intended to be disjoint, can both clear their thresholds on the same query and silently route it to the wrong model. Nothing in the compiler warns about this. We characterize the problem as a three-level decidability hierarchy -- crisp conflicts are decidable via SAT, embedding conflicts reduce to spherical cap intersection, and classifier conflicts are undecidable without distributional knowledge -- and show that for the embedding case, which dominates in practice, replacing independent thresholding with a temperature-scaled softmax

X
Xunzhuo Liu, Hao Wu, Huamin Chen, Bowei He, Xue Liu
· · 1 min read · 3 views

arXiv:2603.18174v1 Announce Type: new Abstract: Conflict detection in policy languages is a solved problem -- as long as every rule condition is a crisp Boolean predicate. BDDs, SMT solvers, and NetKAT all exploit that assumption. But a growing class of routing and access-control systems base their decisions on probabilistic ML signals: embedding similarities, domain classifiers, complexity estimators. Two such signals, declared over categories the author intended to be disjoint, can both clear their thresholds on the same query and silently route it to the wrong model. Nothing in the compiler warns about this. We characterize the problem as a three-level decidability hierarchy -- crisp conflicts are decidable via SAT, embedding conflicts reduce to spherical cap intersection, and classifier conflicts are undecidable without distributional knowledge -- and show that for the embedding case, which dominates in practice, replacing independent thresholding with a temperature-scaled softmax partitions the embedding space into Voronoi regions where co-firing is impossible. No model retraining is needed. We implement the detection and prevention mechanisms in the Semantic Router DSL, a production routing language for LLM inference, and discuss how the same ideas apply to semantic RBAC and API gateway policy.

Executive Summary

This article addresses a critical issue in policy languages for probabilistic machine learning (ML) predicates, where conflicts between probabilistic signals can lead to incorrect routing decisions without warning. The authors propose a framework for conflict-free policy languages, which involves replacing independent thresholding with temperature-scaled softmax to partition the embedding space into Voronoi regions where co-firing is impossible. This approach is implemented in the Semantic Router DSL, a production routing language for large language model (LLM) inference. The authors also discuss the implications of their work for semantic role-based access control (RBAC) and API gateway policy. The proposed solution has significant practical and policy implications, particularly in the context of ML-based decision-making systems.

Key Points

  • The authors identify a critical issue in policy languages for probabilistic ML predicates: conflicts between probabilistic signals can lead to incorrect routing decisions.
  • The proposed solution involves replacing independent thresholding with temperature-scaled softmax to partition the embedding space into Voronoi regions where co-firing is impossible.
  • The approach is implemented in the Semantic Router DSL, a production routing language for LLM inference.

Merits

Comprehensive Framework

The authors provide a comprehensive framework for understanding the decidability hierarchy of conflict detection in policy languages, which includes crisp conflicts, embedding conflicts, and classifier conflicts.

Practical Implementation

The proposed solution is implemented in the Semantic Router DSL, a production routing language for LLM inference, making it a practical and effective solution for real-world applications.

Implications for Policy

The authors discuss the implications of their work for semantic RBAC and API gateway policy, highlighting the potential impact on decision-making systems that rely on probabilistic ML signals.

Demerits

Assumes Distributional Knowledge

The proposed solution assumes distributional knowledge, which may not always be available or applicable in real-world scenarios.

Limited to Embedding Conflicts

The proposed solution focuses on embedding conflicts, which may not be the only type of conflict that occurs in policy languages for probabilistic ML predicates.

Expert Commentary

The article makes a significant contribution to the field of policy languages for probabilistic ML predicates by highlighting the critical issue of conflict detection and proposing a comprehensive framework for addressing it. The proposed solution, which involves replacing independent thresholding with temperature-scaled softmax, is a practical and effective approach that has significant implications for decision-making systems that rely on probabilistic ML signals. However, the solution assumes distributional knowledge, which may not always be available or applicable in real-world scenarios. Additionally, the proposed solution focuses on embedding conflicts, which may not be the only type of conflict that occurs in policy languages for probabilistic ML predicates. Nevertheless, the article provides a valuable framework for understanding the decidability hierarchy of conflict detection in policy languages and highlights the need for further research in this area.

Recommendations

  • Future research should focus on extending the proposed solution to other types of conflicts that occur in policy languages for probabilistic ML predicates.
  • The authors should investigate the applicability of the proposed solution to other domains that rely on probabilistic ML signals, such as natural language processing and computer vision.

Sources