Academic

LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

arXiv:2603.19312v1 Announce Type: new Abstract: Joint Embedding Predictive Architectures (JEPAs) offer a compelling framework for learning world models in compact latent spaces, yet existing methods remain fragile, relying on complex multi-term losses, exponential moving averages, pre-trained encoders, or auxiliary supervision to avoid representation collapse. In this work, we introduce LeWorldModel (LeWM), the first JEPA that trains stably end-to-end from raw pixels using only two loss terms: a next-embedding prediction loss and a regularizer enforcing Gaussian-distributed latent embeddings. This reduces tunable loss hyperparameters from six to one compared to the only existing end-to-end alternative. With ~15M parameters trainable on a single GPU in a few hours, LeWM plans up to 48x faster than foundation-model-based world models while remaining competitive across diverse 2D and 3D control tasks. Beyond control, we show that LeWM's latent space encodes meaningful physical structure

Lucas Maes, Quentin Le Lidec, Damien Scieur, Yann LeCun, Randall Balestriero · March 23, 2026 · 1 min read · 6 views

#cs.LG #cs.AI

Executive Summary

This article presents LeWorldModel, a novel Joint Embedding Predictive Architecture (JEPAs) that achieves stable end-to-end training from raw pixels using only two loss terms. LeWorldModel surpasses existing methods by reducing the number of tunable loss hyperparameters, requiring fewer computations to achieve competitive performance across various control tasks. The model's latent space is shown to encode meaningful physical structure, enabling reliable detection of physically implausible events. While LeWorldModel demonstrates a significant advancement in world modeling, its scalability and applicability in real-world scenarios remain to be explored. This research has the potential to revolutionize the field of artificial intelligence, enabling more efficient and effective learning of complex world models.

Key Points

▸ LeWorldModel is the first JEPAs that trains stably end-to-end from raw pixels using only two loss terms
▸ The model reduces the number of tunable loss hyperparameters from six to one
▸ LeWorldModel achieves competitive performance across diverse 2D and 3D control tasks while requiring fewer computations

Merits

Strength

LeWorldModel achieves stable end-to-end training from raw pixels using a simplified loss function, enabling faster and more efficient learning of complex world models

Competitive Performance

LeWorldModel demonstrates competitive performance across various control tasks, outperforming existing methods in terms of computational efficiency

Physical Structure Encoding

The model's latent space encodes meaningful physical structure, enabling reliable detection of physically implausible events

Demerits

Limitation

The scalability and applicability of LeWorldModel in real-world scenarios remain to be explored

Evaluation on Real-World Data

The model's performance on real-world data and its ability to generalize to diverse environments require further evaluation

Interpretability and Explainability

The interpretability and explainability of LeWorldModel's latent space and decision-making processes require further investigation

Expert Commentary

LeWorldModel represents a significant advancement in the field of artificial intelligence, demonstrating the potential for end-to-end learning of complex world models from raw pixels. The model's ability to encode meaningful physical structure in its latent space and detect physically implausible events is a notable achievement. However, the scalability and applicability of LeWorldModel in real-world scenarios require further evaluation and investigation. As the field continues to evolve, it is essential to explore the implications of LeWorldModel on various industries and policy domains, ensuring a responsible and beneficial development of artificial intelligence.

Recommendations

✓ Future research should focus on exploring the scalability and applicability of LeWorldModel in real-world scenarios and evaluating its performance on diverse environments
✓ Investigate the interpretability and explainability of LeWorldModel's latent space and decision-making processes to ensure a deeper understanding of the model's behavior and limitations
✓ Explore the potential applications of LeWorldModel in various industries and policy domains, ensuring a responsible and beneficial development of artificial intelligence

Sources

Original: arXiv - cs.LG

arXiv - cs.LG

LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

AI Commentary

Executive Summary

Key Points

Merits

Strength

Competitive Performance

Physical Structure Encoding

Demerits

Limitation

Evaluation on Real-World Data

Interpretability and Explainability

Expert Commentary

Recommendations

Sources

Related Articles

ConstitutionGPT: An AI-Powered Multilingual Legal Assistance System for Indian Citizens

AI Copyright Infringement: Navigating the Legal Risks of AI-Generated Content

The Rhetoric of Machine Learning

Busemann energy-based attention for emotion analysis in Poincar\'e discs

JCG, PC

HSOLLC Co., Ltd.