FactorSmith: Agentic Simulation Generation via Markov Decision Process Decomposition with Planner-Designer-Critic Refinement
arXiv:2603.20270v1 Announce Type: new Abstract: Generating executable simulations from natural language specifications remains a challenging problem due to the limited reasoning capacity of large language models (LLMs) when confronted with large, interconnected codebases. This paper presents FactorSmith, a framework that synthesizes playable game simulations in code from textual descriptions by combining two complementary ideas: factored POMDP decomposition for principled context reduction and a hierarchical planner-designer-critic agentic workflow for iterative quality refinement at every generation step. Drawing on the factored partially observable Markov decision process (POMDP) representation introduced by FactorSim [Sun et al., 2024], the proposed method decomposes a simulation specification into modular steps where each step operates only on a minimal subset of relevant state variables, limiting the context window that any single LLM call must process. Inspired by the agentic tr
arXiv:2603.20270v1 Announce Type: new Abstract: Generating executable simulations from natural language specifications remains a challenging problem due to the limited reasoning capacity of large language models (LLMs) when confronted with large, interconnected codebases. This paper presents FactorSmith, a framework that synthesizes playable game simulations in code from textual descriptions by combining two complementary ideas: factored POMDP decomposition for principled context reduction and a hierarchical planner-designer-critic agentic workflow for iterative quality refinement at every generation step. Drawing on the factored partially observable Markov decision process (POMDP) representation introduced by FactorSim [Sun et al., 2024], the proposed method decomposes a simulation specification into modular steps where each step operates only on a minimal subset of relevant state variables, limiting the context window that any single LLM call must process. Inspired by the agentic trio architecture of SceneSmith [Pfaff et al., 2025], FactorSmith embeds within every factored step a three-agent interaction: a planner that orchestrates workflow, a designer that proposes code artifacts, and a critic that evaluates quality through structured scoring, enabling iterative refinement with checkpoint rollback. This paper formalizes the combined approach, presents the mathematical framework underpinning context selection and agentic refinement, and describes the open-source implementation. Experiments on the PyGame Learning Environment benchmark demonstrate that FactorSmith generates simulations with improved prompt alignment, fewer runtime errors, and higher code quality compared to non-agentic factored baselines.
Executive Summary
FactorSmith, a novel framework for synthesizing playable game simulations from textual descriptions, presents a significant advancement in executable simulation generation. By combining factored POMDP decomposition and a hierarchical planner-designer-critic agentic workflow, FactorSmith efficiently reduces context windows and enables iterative quality refinement. Experimental results on the PyGame Learning Environment benchmark demonstrate improved prompt alignment, reduced runtime errors, and enhanced code quality compared to non-agentic factored baselines. The framework's potential applications in AI-driven content creation and game development are substantial.
Key Points
- ▸ FactorSmith integrates factored POMDP decomposition and a hierarchical agentic workflow to generate simulations from textual descriptions.
- ▸ The framework decomposes simulation specifications into modular steps, reducing context windows and enabling efficient LLM processing.
- ▸ Iterative quality refinement is achieved through a planner-designer-critic interaction, facilitating checkpoint rollback and structured scoring.
Merits
Strength in Context Reduction
FactorSmith's factored POMDP decomposition effectively limits context windows, enabling large language models to process complex specifications more efficiently.
Agentic Workflow for Iterative Refinement
The hierarchical planner-designer-critic workflow facilitates iterative quality refinement, improving simulation quality and reducing runtime errors.
Improved Code Quality
FactorSmith's structured scoring and checkpoint rollback mechanisms ensure higher code quality and improved prompt alignment.
Demerits
Limited Generalizability
The framework's performance on diverse simulation specifications and complex codebases may vary, necessitating further adaptation and fine-tuning.
Dependence on Large Language Models
FactorSmith's effectiveness relies on the capabilities of large language models, which may have limitations in processing specific types of input or context.
Computational Resource Intensity
The framework's iterative refinement and checkpoint rollback mechanisms may require significant computational resources, potentially impacting scalability.
Expert Commentary
FactorSmith represents a significant step forward in executable simulation generation, leveraging the strengths of factored POMDP decomposition and hierarchical agentic workflows. While limitations in generalizability and computational resource intensity require consideration, the framework's potential to improve simulation quality, reduce runtime errors, and enhance code quality is substantial. As the AI-driven content creation and game development industries continue to evolve, FactorSmith's applications and implications will warrant close attention from researchers, practitioners, and policymakers.
Recommendations
- ✓ Further research should focus on adapting FactorSmith for diverse simulation specifications and complex codebases, exploring its potential in AI-driven applications beyond game development.
- ✓ The framework's computational resource intensity should be addressed through optimization techniques and distributed computing approaches, enhancing its scalability and practicality.
Sources
Original: arXiv - cs.AI