Academic

ASCAT: An Arabic Scientific Corpus and Benchmark for Advanced Translation Evaluation

Serry Sibaee, Khloud Al Jallad, Zineb Yousfi, Israa Elsayed Elhosiny, Yousra El-Ghawi, Batool Balah, Omer Nacar · April 3, 2026 · 1 min read · 0 views

#cs.CL

arXiv:2604.00015v1 Announce Type: new Abstract: We present ASCAT (Arabic Scientific Corpus for Advanced Translation), a high-quality English-Arabic parallel benchmark corpus designed for scientific translation evaluation constructed through a systematic multi-engine translation and human validation pipeline. Unlike existing Arabic-English corpora that rely on short sentences or single-domain text, ASCAT targets full scientific abstracts averaging 141.7 words (English) and 111.78 words (Arabic), drawn from five scientific domains: physics, mathematics, computer science, quantum mechanics, and artificial intelligence. Each abstract was translated using three complementary architectures generative AI (Gemini), transformer-based models (Hugging Face \texttt{quickmt-en-ar}), and commercial MT APIs (Google Translate, DeepL) and subsequently validated by domain experts at the lexical, syntactic, and semantic levels. The resulting corpus contains 67,293 English tokens and 60,026 Arabic tokens, with an Arabic vocabulary of 17,604 unique words reflecting the morphological richness of the language. We benchmark three state-of-the-art LLMs on the corpus GPT-4o-mini (BLEU: 37.07), Gemini-3.0-Flash-Preview (BLEU: 30.44), and Qwen3-235B-A22B (BLEU: 23.68) demonstrating its discriminative power as an evaluation benchmark. ASCAT addresses a critical gap in scientific MT resources for Arabic and is designed to support rigorous evaluation of scientific translation quality and training of domain-specific translation models.

Executive Summary

The ASCAT corpus represents a significant advancement in Arabic scientific translation evaluation by introducing a high-quality, domain-specific parallel benchmark. Unlike prior corpora constrained by short sentences or limited domains, ASCAT employs full scientific abstracts (avg. 141.7 words in English, 111.78 in Arabic) across five critical scientific fields—physics, mathematics, computer science, quantum mechanics, and AI—ensuring contextual richness and real-world relevance. Its systematic pipeline—integrating generative AI (Gemini), transformer-based models, and commercial APIs (Google Translate, DeepL)—with subsequent human validation by domain experts at lexical, syntactic, and semantic levels, enhances credibility and applicability. With 67,293 English and 60,026 Arabic tokens and a unique Arabic vocabulary of 17,604, ASCAT offers a scalable, representative dataset. The benchmarking of LLMs (GPT-4o-mini, Gemini-3.0-Flash-Preview, Qwen3-235B-A22B) further validates its discriminative power. This corpus fills a critical void in Arabic scientific MT resources and supports both evaluation and model training.

Key Points

▸ ASCAT uses full scientific abstracts rather than short sentences or single-domain texts
▸ It incorporates multi-engine translation (Gemini, Hugging Face, Google Translate, DeepL) followed by human validation
▸ The corpus is benchmarked against top LLMs, demonstrating its utility as an evaluation tool

Merits

Comprehensive Domain Coverage

ASCAT’s inclusion of five distinct scientific domains enhances generalizability and applicability across scientific translation contexts.

Robust Validation Process

The inclusion of human validation by domain experts at multiple linguistic levels elevates the quality and reliability of the corpus.

Demerits

Scalability Constraint

While substantial, the corpus size (67,293 English tokens) may limit its capacity to support exhaustive training of highly complex models without augmentation.

Expert Commentary

ASCAT is a landmark contribution to the field of computational linguistics and scientific translation. Its design reflects a nuanced understanding of the challenges inherent in Arabic linguistic morphology and scientific content complexity. The decision to target full abstracts—rather than fragmented sentences—is particularly commendable, as it aligns with real-world translation scenarios and supports more accurate evaluation of semantic coherence and lexical alignment. Moreover, the multi-engine translation strategy, combined with expert validation, creates a hybrid approach that balances scalability with quality assurance. The benchmarking results, while revealing the relative strengths of current LLMs, also highlight a critical insight: even top-tier models struggle with domain-specific nuances in Arabic, underscoring the need for domain-adapted training data. This corpus does not merely fill a gap—it catalyzes a new standard for evaluating scientific translation quality in non-Western languages. It is a model for future corpus development in underrepresented linguistic domains.

Recommendations

✓ Extend ASCAT to additional scientific domains such as medicine or engineering to broaden applicability.
✓ Develop a standardized evaluation protocol based on ASCAT for use in academic research and industry MT evaluation.

Sources

Original: arXiv - cs.CL

arXiv - cs.CL

ASCAT: An Arabic Scientific Corpus and Benchmark for Advanced Translation Evaluation

AI Commentary

Executive Summary

Key Points

Merits

Comprehensive Domain Coverage

Robust Validation Process

Demerits

Scalability Constraint

Expert Commentary

Recommendations

Sources

Related Articles

AI-Driven Approaches to Enhancing Fairness and Identifying Algorithmic Bias in …

High resolution schemes for hyperbolic conservation laws

Robust Graph Representation Learning via Adaptive Spectral Contrast

Towards Intrinsically Calibrated Uncertainty Quantification in Industrial Data-Driven Models via …

JCG, PC

HSOLLC Co., Ltd.