ReLearn Challenge — Multimodal Theory of Mind @ CVPR 2026

Overview

This challenge is part of the ReLearn Workshop at CVPR 2026 in Denver, Colorado. ReLearn studies the relationship between human and machine intelligence and how cognitive and psychological insights can guide the future of AI.

Theory of Mind (ToM)—the ability to infer others' mental states—is a cornerstone of human intelligence, making it an ideal testbed for evaluating how well AI systems can learn social reasoning from human behavior. We propose a challenge that leverages multimodal inputs (videos, textual scene descriptions, and dialogues between agents) to infer an agent's mental states, including goals, beliefs, and beliefs about others' goals. This setup introduces the additional complexity of identifying which modality contributes essential cues for reasoning about unobserved mental states. While humans can accurately perform such reasoning, current models remain far from this level of accuracy.

Building on recently released benchmarks, MMToM-QA (ACL'24) and MuMA-ToM (AAAI'25), we will host two tracks in this challenge: 1) reasoning from a single agent's behavior and 2) reasoning from multiple agents' interactions.

Track 1

Single-Agent Reasoning

Infer goals from video and textual scene descriptions.

Go to participation →

Track 2

Multi-Agent Reasoning

Infer beliefs and social goals from multiple agents' interactions and dialogue.

Track 2 example: belief, social goal, and belief-of-goal inference from multi-agent interactions.

Go to participation →

Timeline

Registration & submission opens: February 23, 2026
Submissions due: May 3, 2026
Final results & winners announced: May 5, 2026
Winners submit technical report: May 19, 2026

For full workshop details, invited speakers, and call for papers, see the ReLearn workshop page.

Participation

Registration

To participate in the challenge, you must register your team by filling out the form below. Registration is a strict requirement. After you register, you will receive a Team ID—use this ID when submitting your results for Track 1 and Track 2.

Register your team (Google Form)

Track 1 Single-Agent Reasoning

Dataset

This track follows the MMToM-QA benchmark. You will need to generate answers to the questions in MMToM-QA/Benchmark/questions.json and store the results in the following JSON format for evaluation.

Submission format

Submit a single JSON file (.json).

One JSON array: one object per question.
Each object: question_id (integer) and answer (string).
Answer must be one of the benchmark options, e.g. "a" or "b".

Example: [ {"question_id": 1, "answer": "a"}, {"question_id": 2, "answer": "b"}, ... ]

Submit your results

Track 2 Multi-Agent Reasoning

Dataset

This track follows the MuMA-ToM benchmark. You will need to generate answers to the questions in MUMA-TOM-BENCHMARK/questions.json and store the results in the following JSON format for evaluation.

Submission format

Submit a single JSON file (.json).

One JSON array: one object per question.
Each object: scenario_id (integer), question_id (integer, within that scenario), and answer (string).
Answer must be the option letter, e.g. "A", "B", or "C".

Example: [ {"scenario_id": 1, "question_id": 1, "answer": "B"}, ... ]

Submit your results

Help

General help

Dr. Xi Wang : xi.wang@inf.ethz.ch

Dr. Yen-Ling Kuo : ylkuo@virginia.edu

Submission help

Yan Zhuang : yyx3hd@virginia.edu