MOSAIC: Composable Safety Alignment with Modular Control Tokens
arXiv:2603.16210v1 Announce Type: new Abstract: Safety alignment in large language models (LLMs) is commonly implemented as a single static policy embedded in model parameters. However, …
Jingyu Peng, Hongyu Chen, Jiancheng Dong, Maolin Wang, Wenxi Li, Yuchen Li, Kai Zhang, Xiangyu Zhao
5 views