ASCAT: An Arabic Scientific Corpus and Benchmark for Advanced Translation Evaluation
arXiv:2604.00015v1 Announce Type: new Abstract: We present ASCAT (Arabic Scientific Corpus for Advanced Translation), a high-quality English-Arabic parallel benchmark corpus designed for scientific translation evaluation constructed through a systematic multi-engine translation and human validation pipeline. Unlike existing Arabic-English corpora that rely on short sentences or single-domain text, ASCAT targets full scientific abstracts averaging 141.7 words (English) and 111.78 words (Arabic), drawn from five scientific domains: physics, mathematics, computer science, quantum mechanics, and artificial intelligence. Each abstract was translated using three complementary architectures generative AI (Gemini), transformer-based models (Hugging Face \texttt{quickmt-en-ar}), and commercial MT APIs (Google Translate, DeepL) and subsequently validated by domain experts at the lexical, syntactic, and semantic levels. The resulting corpus contains 67,293 English tokens and 60,026 Arabic tokens
arXiv:2604.00015v1 Announce Type: new Abstract: We present ASCAT (Arabic Scientific Corpus for Advanced Translation), a high-quality English-Arabic parallel benchmark corpus designed for scientific translation evaluation constructed through a systematic multi-engine translation and human validation pipeline. Unlike existing Arabic-English corpora that rely on short sentences or single-domain text, ASCAT targets full scientific abstracts averaging 141.7 words (English) and 111.78 words (Arabic), drawn from five scientific domains: physics, mathematics, computer science, quantum mechanics, and artificial intelligence. Each abstract was translated using three complementary architectures generative AI (Gemini), transformer-based models (Hugging Face \texttt{quickmt-en-ar}), and commercial MT APIs (Google Translate, DeepL) and subsequently validated by domain experts at the lexical, syntactic, and semantic levels. The resulting corpus contains 67,293 English tokens and 60,026 Arabic tokens, with an Arabic vocabulary of 17,604 unique words reflecting the morphological richness of the language. We benchmark three state-of-the-art LLMs on the corpus GPT-4o-mini (BLEU: 37.07), Gemini-3.0-Flash-Preview (BLEU: 30.44), and Qwen3-235B-A22B (BLEU: 23.68) demonstrating its discriminative power as an evaluation benchmark. ASCAT addresses a critical gap in scientific MT resources for Arabic and is designed to support rigorous evaluation of scientific translation quality and training of domain-specific translation models.
Executive Summary
The ASCAT corpus represents a significant advancement in Arabic scientific translation evaluation by introducing a high-quality, domain-specific parallel benchmark. Unlike prior corpora constrained by short sentences or limited domains, ASCAT employs full scientific abstracts (avg. 141.7 words in English, 111.78 in Arabic) across five critical scientific fields—physics, mathematics, computer science, quantum mechanics, and AI—ensuring contextual richness and real-world relevance. Its systematic pipeline—integrating generative AI (Gemini), transformer-based models, and commercial APIs (Google Translate, DeepL)—with subsequent human validation by domain experts at lexical, syntactic, and semantic levels, enhances credibility and applicability. With 67,293 English and 60,026 Arabic tokens and a unique Arabic vocabulary of 17,604, ASCAT offers a scalable, representative dataset. The benchmarking of LLMs (GPT-4o-mini, Gemini-3.0-Flash-Preview, Qwen3-235B-A22B) further validates its discriminative power. This corpus fills a critical void in Arabic scientific MT resources and supports both evaluation and model training.
Key Points
- ▸ ASCAT uses full scientific abstracts rather than short sentences or single-domain texts
- ▸ It incorporates multi-engine translation (Gemini, Hugging Face, Google Translate, DeepL) followed by human validation
- ▸ The corpus is benchmarked against top LLMs, demonstrating its utility as an evaluation tool
Merits
Comprehensive Domain Coverage
ASCAT’s inclusion of five distinct scientific domains enhances generalizability and applicability across scientific translation contexts.
Robust Validation Process
The inclusion of human validation by domain experts at multiple linguistic levels elevates the quality and reliability of the corpus.
Demerits
Scalability Constraint
While substantial, the corpus size (67,293 English tokens) may limit its capacity to support exhaustive training of highly complex models without augmentation.
Expert Commentary
ASCAT is a landmark contribution to the field of computational linguistics and scientific translation. Its design reflects a nuanced understanding of the challenges inherent in Arabic linguistic morphology and scientific content complexity. The decision to target full abstracts—rather than fragmented sentences—is particularly commendable, as it aligns with real-world translation scenarios and supports more accurate evaluation of semantic coherence and lexical alignment. Moreover, the multi-engine translation strategy, combined with expert validation, creates a hybrid approach that balances scalability with quality assurance. The benchmarking results, while revealing the relative strengths of current LLMs, also highlight a critical insight: even top-tier models struggle with domain-specific nuances in Arabic, underscoring the need for domain-adapted training data. This corpus does not merely fill a gap—it catalyzes a new standard for evaluating scientific translation quality in non-Western languages. It is a model for future corpus development in underrepresented linguistic domains.
Recommendations
- ✓ Extend ASCAT to additional scientific domains such as medicine or engineering to broaden applicability.
- ✓ Develop a standardized evaluation protocol based on ASCAT for use in academic research and industry MT evaluation.
Sources
Original: arXiv - cs.CL