Academic

Academic

Academic · 1 min

Faster Superword Tokenization

arXiv:2604.05192v1 Announce Type: new Abstract: Byte Pair Encoding (BPE) is a widely used tokenization algorithm, whose tokens cannot extend across pre-tokenization boundaries, functionally limiting it …

Craig W. Schmidt, Chris Tanner, Yuval Pinter
33 views
Academic · 1 min

Weight-Informed Self-Explaining Clustering for Mixed-Type Tabular Data

arXiv:2604.05857v1 Announce Type: new Abstract: Clustering mixed-type tabular data is fundamental for exploratory analysis, yet remains challenging due to misaligned numerical-categorical representations, uneven and context-dependent …

Lehao Li, Qiang Huang, Yihao Ang, Bryan Kian Hsiang Low, Anthony K. H. Tung, Xiaokui Xiao
54 views