Academic

Improvisational Games as a Benchmark for Social Intelligence of AI Agents: The Case of Connections

arXiv:2604.00284v1 Announce Type: new Abstract: We formally introduce a improvisational wordplay game called Connections to explore reasoning capabilities of AI agents. Playing Connections combines skills in knowledge retrieval, summarization and awareness of cognitive states of other agents. We show how the game serves as a good benchmark for social intelligence abilities of language model based agents that go beyond the agents' own memory and deductive reasoning and also involve gauging the understanding capabilities of other agents. Finally, we show how through communication with other agents in a constrained environment, AI agents must demonstrate social awareness and intelligence in games involving collaboration.

G
Gaurav Rajesh Parikh, Angikar Ghosal
· · 1 min read · 3 views

arXiv:2604.00284v1 Announce Type: new Abstract: We formally introduce a improvisational wordplay game called Connections to explore reasoning capabilities of AI agents. Playing Connections combines skills in knowledge retrieval, summarization and awareness of cognitive states of other agents. We show how the game serves as a good benchmark for social intelligence abilities of language model based agents that go beyond the agents' own memory and deductive reasoning and also involve gauging the understanding capabilities of other agents. Finally, we show how through communication with other agents in a constrained environment, AI agents must demonstrate social awareness and intelligence in games involving collaboration.

Executive Summary

The article introduces 'Connections,' an improvisational wordplay game, as a novel benchmark for evaluating the social intelligence of AI agents. By requiring agents to retrieve knowledge, summarize information, and assess the cognitive states of other agents, Connections transcends traditional memory-based or deductive reasoning tests. The authors argue that the game effectively measures social awareness and collaborative intelligence in constrained environments, where communication and mutual understanding are critical. This approach highlights the limitations of current AI models in gauging social cognition and underscores the need for frameworks that evaluate interactional competencies beyond individual task performance.

Key Points

  • Introduces Connections as a benchmark for social intelligence in AI agents, moving beyond traditional cognitive task evaluations.
  • Demonstrates that the game requires multi-faceted skills: knowledge retrieval, summarization, and awareness of others' cognitive states.
  • Emphasizes the collaborative and communicative dimensions of AI agents in constrained environments, highlighting social awareness as a key metric.

Merits

Novel Benchmark Framework

The article presents Connections as a groundbreaking benchmark that shifts focus from individual AI capabilities to social and interactional competencies, addressing a critical gap in AI evaluation.

Multi-Dimensional Evaluation

By incorporating knowledge retrieval, summarization, and social awareness, the framework provides a more holistic assessment of AI agents, capturing nuanced aspects of intelligence.

Practical Relevance

The game-based approach offers a tangible and scalable method for evaluating AI in real-world collaborative scenarios, bridging theoretical and applied research.

Demerits

Limited Generalizability

The study focuses narrowly on Connections, leaving open questions about the applicability of this benchmark to other social intelligence tasks or domains.

Dependence on Linguistic Competency

The framework assumes high linguistic proficiency in AI agents, which may not be representative of broader social intelligence or non-linguistic interactions.

Constraints of Game-Based Evaluation

The constrained environment of the game may not fully replicate the complexities and unpredictability of real-world social interactions.

Expert Commentary

The authors make a compelling case for redefining AI evaluation metrics to include social intelligence, a domain long overlooked in favor of technical or task-specific benchmarks. While the introduction of Connections is innovative, it raises important questions about the scalability and generalizability of such game-based evaluations. The emphasis on collaborative intelligence is timely, particularly as AI systems increasingly interact with humans in complex social contexts. However, the reliance on linguistic prowess may inadvertently limit the scope of this benchmark to text-based interactions, overlooking non-verbal and multimodal aspects of social intelligence. Furthermore, the constrained nature of the game may not fully capture the dynamism of real-world social environments. Nonetheless, the article serves as a valuable contribution to the discourse on AI evaluation, pushing the field toward more holistic and human-centric metrics. Future work should explore hybrid benchmarks that integrate linguistic, visual, and contextual dimensions of social intelligence.

Recommendations

  • Expand the Connections framework to include multimodal interactions, incorporating visual and auditory cues to better reflect real-world social intelligence.
  • Develop a suite of complementary benchmarks that address different facets of social intelligence, such as emotional recognition, conflict resolution, and cultural adaptability, to create a more comprehensive evaluation toolkit.
  • Collaborate with cognitive scientists and psychologists to refine the theoretical underpinnings of social intelligence in AI, ensuring that benchmarks are grounded in empirical research on human social cognition.

Sources

Original: arXiv - cs.AI