TurkicNLP: An NLP Toolkit for Turkic Languages
arXiv:2602.19174v1 Announce Type: new Abstract: Natural language processing for the Turkic language family, spoken by over 200 million people across Eurasia, remains fragmented, with most …
Quality follows upgrading
All Articles
arXiv:2602.19174v1 Announce Type: new Abstract: Natural language processing for the Turkic language family, spoken by over 200 million people across Eurasia, remains fragmented, with most …
arXiv:2602.19177v1 Announce Type: new Abstract: The increasing use of Large Language Models (LLMs) as proxies for human participants in social science research presents a promising, …
arXiv:2602.19212v1 Announce Type: new Abstract: Hateful content on social media increasingly appears as multimodal memes that combine images and text to convey harmful narratives. In …
arXiv:2602.19317v1 Announce Type: new Abstract: Personalization in Question Answering (QA) requires answers that are both accurate and aligned with users' background, preferences, and historical context. …
arXiv:2602.19320v1 Announce Type: new Abstract: Agentic memory systems enable large language model (LLM) agents to maintain state across long interactions, supporting long-horizon reasoning and personalization …
arXiv:2602.19333v1 Announce Type: new Abstract: This research introduces the first large-scale, well-balanced Persian social media text classification dataset, specifically designed to address the lack of …
arXiv:2602.19509v1 Announce Type: new Abstract: Large Language Models (LLMs) face a persistent trade-off between inference cost and reasoning capability. While "Oracle" models (e.g., Llama-3-70B) achieve …
arXiv:2602.19526v1 Announce Type: new Abstract: Deep Research agents tackle knowledge-intensive tasks through multi-round retrieval and decision-oriented generation. While reinforcement learning (RL) has been shown to …
arXiv:2602.19543v1 Announce Type: new Abstract: Knowledge hypergraphs surpass traditional binary knowledge graphs by encapsulating complex $n$-ary atomic facts, providing a more comprehensive paradigm for semantic …
arXiv:2602.19548v1 Announce Type: new Abstract: One of the first pre-processing steps for constructing web-scale LLM pretraining datasets involves extracting text from HTML. Despite the immense …
arXiv:2602.19549v1 Announce Type: new Abstract: Visual Document Retrieval (VDR), which aims to retrieve relevant pages within vast corpora of visually-rich documents, is of significance in …
arXiv:2602.19569v1 Announce Type: new Abstract: Question Answering over Temporal Knowledge Graphs (TKGQA) has attracted growing interest for handling time-sensitive queries. However, existing methods still struggle …