Academic

CN-Buzz2Portfolio: A Chinese-Market Dataset and Benchmark for LLM-Based Macro and Sector Asset Allocation from Daily Trending Financial News

arXiv:2603.22305v1 Announce Type: new Abstract: Large Language Models (LLMs) are rapidly transitioning from static Natural Language Processing (NLP) tasks including sentiment analysis and event extraction to acting as dynamic decision-making agents in complex financial environments. However, the evolution of LLMs into autonomous financial agents faces a significant dilemma in evaluation paradigms. Direct live trading is irreproducible and prone to outcome bias by confounding luck with skill, whereas existing static benchmarks are often confined to entity-level stock picking and ignore broader market attention. To facilitate the rigorous analysis of these challenges, we introduce CN-Buzz2Portfolio, a reproducible benchmark grounded in the Chinese market that maps daily trending news to macro and sector asset allocation. Spanning a rolling horizon from 2024 to mid-2025, our dataset simulates a realistic public attention stream, requiring agents to distill investment logic from high-expo

arXiv:2603.22305v1 Announce Type: new Abstract: Large Language Models (LLMs) are rapidly transitioning from static Natural Language Processing (NLP) tasks including sentiment analysis and event extraction to acting as dynamic decision-making agents in complex financial environments. However, the evolution of LLMs into autonomous financial agents faces a significant dilemma in evaluation paradigms. Direct live trading is irreproducible and prone to outcome bias by confounding luck with skill, whereas existing static benchmarks are often confined to entity-level stock picking and ignore broader market attention. To facilitate the rigorous analysis of these challenges, we introduce CN-Buzz2Portfolio, a reproducible benchmark grounded in the Chinese market that maps daily trending news to macro and sector asset allocation. Spanning a rolling horizon from 2024 to mid-2025, our dataset simulates a realistic public attention stream, requiring agents to distill investment logic from high-exposure narratives instead of pre-filtered entity news. We propose a Tri-Stage CPA Agent Workflow involving Compression, Perception, and Allocation to evaluate LLMs on broad asset classes such as Exchange Traded Funds (ETFs) rather than individual stocks, thereby reducing idiosyncratic volatility. Extensive experiments on nine LLMs reveal significant disparities in how models translate macro-level narratives into portfolio weights. This work provides new insights into the alignment between general reasoning and financial decision-making, and all data, codes, and experiments are released to promote sustainable financial agent research.

Executive Summary

The article introduces CN-Buzz2Portfolio, a Chinese-market dataset and benchmark for Large Language Model (LLM)-based macro and sector asset allocation from daily trending financial news. The authors propose a Tri-Stage CPA Agent Workflow to evaluate LLMs on broad asset classes, such as Exchange Traded Funds (ETFs), and conduct extensive experiments on nine LLMs. The study reveals significant disparities in how models translate macro-level narratives into portfolio weights, providing new insights into the alignment between general reasoning and financial decision-making. The authors' approach addresses the evaluation dilemma of LLMs in financial environments, facilitating the rigorous analysis of their challenges.

Key Points

  • Introduction of CN-Buzz2Portfolio, a Chinese-market dataset and benchmark for LLM-based macro and sector asset allocation
  • Proposed Tri-Stage CPA Agent Workflow for evaluating LLMs on broad asset classes
  • Extensive experiments on nine LLMs reveal significant disparities in model performance

Merits

Strength in addressing evaluation dilemma

The authors' approach addresses the evaluation dilemma of LLMs in financial environments, facilitating the rigorous analysis of their challenges. The use of a reproducible benchmark and a rolling horizon from 2024 to mid-2025 allows for the simulation of realistic public attention streams.

Insights into alignment between general reasoning and financial decision-making

The study reveals significant disparities in how models translate macro-level narratives into portfolio weights, providing new insights into the alignment between general reasoning and financial decision-making.

Release of data, codes, and experiments for sustainable financial agent research

The authors' commitment to reproducibility and transparency promotes sustainable financial agent research by allowing other researchers to build upon and extend their work.

Demerits

Limitation to Chinese market

The study is limited to the Chinese market, which may not be representative of other financial markets. The authors' findings may not be generalizable to other markets or asset classes.

Dependence on LLM performance

The study's results are heavily dependent on the performance of the LLMs used in the experiments. The quality of the LLMs and their ability to translate macro-level narratives into portfolio weights may impact the validity of the study's findings.

Expert Commentary

The article makes a significant contribution to the field of financial decision-making by addressing the evaluation dilemma of LLMs in financial environments. The proposed Tri-Stage CPA Agent Workflow and the CN-Buzz2Portfolio benchmark provide a rigorous and reproducible framework for evaluating LLMs on broad asset classes. The study's findings highlight the importance of considering the alignment between general reasoning and financial decision-making in the development of LLM-based financial decision-making systems. However, the study's limitation to the Chinese market and dependence on LLM performance are notable limitations that should be addressed in future research.

Recommendations

  • Recommendation 1: Future research should extend the study's approach to other financial markets and asset classes to evaluate the performance of LLMs.
  • Recommendation 2: Researchers should consider the potential risks and benefits of using LLMs in financial decision-making and inform policy discussions around regulation.

Sources

Original: arXiv - cs.LG