Academic

Academic

Academic · 1 min

FLUX: Data Worth Training On

arXiv:2603.13972v1 Announce Type: new Abstract: Modern large language model training is no longer limited by data availability, but by the inability of existing preprocessing pipelines …

Gowtham, Sai Rupesh, Sanjay Kumar, Saravanan, Venkata Chaithanya
2 views
Academic · 1 min

SemEval-2026 Task 6: CLARITY -- Unmasking Political Question Evasions

arXiv:2603.14027v1 Announce Type: new Abstract: Political speakers often avoid answering questions directly while maintaining the appearance of responsiveness. Despite its importance for public discourse, such …

Konstantinos Thomas, Giorgos Filandrianos, Maria Lymperaiou, Chrysoula Zerva, Giorgos Stamou
2 views
Academic · 1 min

NepTam: A Nepali-Tamang Parallel Corpus and Baseline Machine Translation Experiments

arXiv:2603.14053v1 Announce Type: new Abstract: Modern Translation Systems heavily rely on high-quality, large parallel datasets for state-of-the-art performance. However, such resources are largely unavailable for …

Rupak Raj Ghimire, Bipesh Subedi, Balaram Prasain, Prakash Poudyal, Praveen Acharya, Nischal Karki, Rupak Tiwari, Rishikesh Kumar Sharma, Jenny Poudel, Bal Krishna Bal
6 views
Academic · 1 min

OasisSimp: An Open-source Asian-English Sentence Simplification Dataset

arXiv:2603.14111v1 Announce Type: new Abstract: Sentence simplification aims to make complex text more accessible by reducing linguistic complexity while preserving the original meaning. However, progress …

Hannah Liu, Muxin Tian, Iqra Ali, Haonan Gao, Qiaoyiwen Wu, Blair Yang, Uthayasanker Thayasivam, En-Shiun Annie Lee, Pakawat Nakwijit, Surangika Ranathunga, Ravi Shekhar
6 views
Academic · 1 min

The GELATO Dataset for Legislative NER

arXiv:2603.14130v1 Announce Type: new Abstract: This paper introduces GELATO (Government, Executive, Legislative, and Treaty Ontology), a dataset of U.S. House and Senate bills from the …

Matthew Flynn, Timothy Obiso, Sam Newman
7 views