Researchers at Carnegie Mellon University have introduced Sotopia, a platform designed to evaluate and enhance AIβs social capabilities. Sotopia focuses on assessing AIβs performance in goal-oriented social interactions, like collaboration, negotiation, and competition.
π Key Findings: Performance Evaluation: The platform enables testing and comparison of different AI systems, with a specific emphasis on refining Mistral-7B. π οΈ Benchmarking: Sotopia uses GPT-4 as a benchmark to evaluate other AI systemsβ capabilities. π
π§ Technical Points: Foundation: Sotopia builds upon Mistral-7B, focusing on behavior cloning and self-reinforcement. ποΈ Multi-Dimensional Assessment: Sotopia evaluates AI performance across 7 social dimensions, including believability, adherence to social norms, and successful goal completion. π Data Collection: The platform gathers data from human-human, human-AI, and AI-AI interactions. π