Morphemes Without Borders: Evaluating Root-Pattern Morphology in Arabic Tokenizers and LLMs
arXiv:2603.15773v1 Announce Type: new Abstract: This work investigates how effectively large language models (LLMs) and their tokenization schemes represent and generate Arabic root-pattern morphology, probing …