Academic

Sabi\'a-4 Technical Report

arXiv:2603.10213v1 Announce Type: new Abstract: This technical report presents Sabi\'a-4 and Sabiazinho-4, a new generation of Portuguese language models with a focus on Brazilian Portuguese language. The models were developed through a four-stage training pipeline: continued pre-training on Portuguese and Brazilian legal corpora, long-context extension to 128K tokens, supervised fine-tuning on instruction data spanning chat, code, legal tasks, and function calling, and preference alignment. We evaluate the models on six benchmark categories: conversational capabilities in Brazilian Portuguese, knowledge of Brazilian legislation, long-context understanding, instruction following, standardized exams, and agentic capabilities including tool use and web navigation. Results show that Sabi\'a-4 and Sabiazinho-4 achieve a favorable cost-performance trade-off compared to other models, positioning them in the upper-left region of the pricing-accuracy chart. The models show improvements over p

arXiv:2603.10213v1 Announce Type: new Abstract: This technical report presents Sabi\'a-4 and Sabiazinho-4, a new generation of Portuguese language models with a focus on Brazilian Portuguese language. The models were developed through a four-stage training pipeline: continued pre-training on Portuguese and Brazilian legal corpora, long-context extension to 128K tokens, supervised fine-tuning on instruction data spanning chat, code, legal tasks, and function calling, and preference alignment. We evaluate the models on six benchmark categories: conversational capabilities in Brazilian Portuguese, knowledge of Brazilian legislation, long-context understanding, instruction following, standardized exams, and agentic capabilities including tool use and web navigation. Results show that Sabi\'a-4 and Sabiazinho-4 achieve a favorable cost-performance trade-off compared to other models, positioning them in the upper-left region of the pricing-accuracy chart. The models show improvements over previous generations in legal document drafting, multi-turn dialogue quality, and agentic task completion.

Executive Summary

This technical report presents Sabi'a-4 and Sabiazinho-4, a new generation of Portuguese language models designed for Brazilian Portuguese language with a focus on legal applications. The models undergo a four-stage training pipeline, including continued pre-training, long-context extension, supervised fine-tuning, and preference alignment. Evaluation shows significant improvements over previous generations in legal document drafting, multi-turn dialogue quality, and agentic task completion, positioning them in the upper-left region of the pricing-accuracy chart. The report highlights the models' favorable cost-performance trade-off, making them an attractive option for applications requiring conversational capabilities, knowledge of Brazilian legislation, and agentic tasks.

Key Points

  • Development of Sabi'a-4 and Sabiazinho-4, a new generation of Portuguese language models for Brazilian Portuguese language
  • Four-stage training pipeline includes continued pre-training, long-context extension, supervised fine-tuning, and preference alignment
  • Evaluation shows improvements over previous generations in legal document drafting, multi-turn dialogue quality, and agentic task completion

Merits

Strength in Legal Applications

The models' training on Portuguese and Brazilian legal corpora enables them to demonstrate significant improvements in legal document drafting and knowledge of Brazilian legislation, making them an attractive option for applications requiring these skills.

Favorable Cost-Performance Trade-Off

The models' positioning in the upper-left region of the pricing-accuracy chart indicates a favorable cost-performance trade-off, making them a cost-effective solution for applications requiring conversational capabilities and agentic tasks.

Demerits

Limited Domain Knowledge

While the models demonstrate significant improvements in legal applications, their knowledge may be limited to specific domains and may not generalize well to other areas, such as cultural or social contexts.

Potential Bias

The models' training data may contain biases, which could impact their performance and accuracy, particularly in sensitive areas such as legal applications where fairness and impartiality are crucial.

Expert Commentary

The Sabi'a-4 and Sabiazinho-4 models demonstrate significant improvements over previous generations in legal document drafting, multi-turn dialogue quality, and agentic task completion, making them an attractive option for applications requiring conversational capabilities, knowledge of Brazilian legislation, and agentic tasks. However, the models' limited domain knowledge and potential bias raise concerns that must be carefully addressed to ensure their fairness and effectiveness in legal applications. The report highlights the importance of evaluating language models on a range of tasks and metrics to ensure their effectiveness and fairness, particularly in legal applications. Furthermore, the report raises concerns about potential biases in the models' training data and highlights the need for careful evaluation and mitigation of biases in AI systems, particularly in sensitive areas such as legal applications.

Recommendations

  • Develop and deploy more robust evaluation frameworks to ensure language models' effectiveness and fairness in legal applications.
  • Implement careful data curation and bias mitigation strategies to ensure language models' fairness and impartiality, particularly in sensitive areas such as legal applications.

Sources