Academic

Just Use XML: Revisiting Joint Translation and Label Projection

arXiv:2603.12021v1 Announce Type: new Abstract: Label projection is an effective technique for cross-lingual transfer, extending span-annotated datasets from a high-resource language to low-resource ones. Most approaches perform label projection as a separate step after machine translation, and prior work that combines the two reports degraded translation quality. We re-evaluate this claim with LabelPigeon, a novel framework that jointly performs translation and label projection via XML tags. We design a direct evaluation scheme for label projection, and find that LabelPigeon outperforms baselines and actively improves translation quality in 11 languages. We further assess translation quality across 203 languages and varying annotation complexity, finding consistent improvement attributed to additional fine-tuning. Finally, across 27 languages and three downstream tasks, we report substantial gains in cross-lingual transfer over comparable work, up to +39.9 F1 on NER. Overall, our res

T
Thennal D K, Chris Biemann, Hans Ole Hatzel
· · 1 min read · 7 views

arXiv:2603.12021v1 Announce Type: new Abstract: Label projection is an effective technique for cross-lingual transfer, extending span-annotated datasets from a high-resource language to low-resource ones. Most approaches perform label projection as a separate step after machine translation, and prior work that combines the two reports degraded translation quality. We re-evaluate this claim with LabelPigeon, a novel framework that jointly performs translation and label projection via XML tags. We design a direct evaluation scheme for label projection, and find that LabelPigeon outperforms baselines and actively improves translation quality in 11 languages. We further assess translation quality across 203 languages and varying annotation complexity, finding consistent improvement attributed to additional fine-tuning. Finally, across 27 languages and three downstream tasks, we report substantial gains in cross-lingual transfer over comparable work, up to +39.9 F1 on NER. Overall, our results demonstrate that XML-tagged label projection provides effective and efficient label transfer without compromising translation quality.

Executive Summary

The article presents a novel framework, LabelPigeon, that integrates label projection and machine translation using XML tags, challenging the conventional assumption that combining these processes degrades translation quality. Through empirical evaluation across multiple languages and annotation complexities, the authors demonstrate that LabelPigeon not only preserves translation quality but enhances it, achieving significant gains in cross-lingual transfer metrics—up to +39.9 F1 on NER—in 11 languages. The study introduces a direct evaluation scheme and leverages fine-tuning to support these findings. This work offers a compelling alternative to conventional sequential workflows, suggesting broader applicability in low-resource language processing.

Key Points

  • LabelPigeon combines translation and label projection via XML tags
  • Contrary to prior claims, joint processing improves translation quality
  • Empirical results show up to +39.9 F1 improvement in NER across 27 languages

Merits

Innovative Integration

LabelPigeon’s use of XML tags to synchronize translation and label projection is a novel and effective method that avoids prior degradation issues.

Empirical Validation

Comprehensive evaluation across diverse languages, annotation types, and downstream tasks substantiates the claim of improved performance.

Demerits

Generalizability Concern

Results are based on specific XML-tag implementation; applicability to non-XML formats or other annotation systems remains unproven.

Expert Commentary

This paper represents a significant advancement in the field of cross-lingual NLP. The authors have effectively overturned a widely held belief by demonstrating that combining translation and label projection—via a well-designed XML-tag interface—does not compromise translation quality but instead enhances it. The choice of XML as a structural anchor is particularly insightful, as it aligns with existing annotation conventions and facilitates interoperability. Moreover, the reported gains across multiple languages and task domains suggest that this is not a niche improvement but a systemic shift in approach. The study’s thoroughness in evaluating across 203 languages and varying complexity levels elevates its credibility. While the limitation regarding XML dependency is valid, it is outweighed by the methodological innovation and empirical validation. This work sets a new benchmark and may catalyze a reevaluation of best practices in cross-lingual annotation workflows.

Recommendations

  • Adopt LabelPigeon or similar XML-tag frameworks in low-resource translation pipelines where efficiency and quality are critical.
  • Encourage further research to extend this methodology to non-XML annotation systems and diverse linguistic typologies.

Sources