The workshop aims to bring together researchers working on scene-aware human modelling across animation, robotics, digital avatars, linguistics, and content creation. We envision HSI @ ECCV 2026 as a forum bridging insights from these diverse communities, each approaching shared challenges from different perspectives. This interdisciplinarity is reflected in our confirmed speakers, whose expertise spans gesture synthesis, 3D vision, character simulation, and robotics. To catalyse progress, the workshop will host a challenge based on the MM-Conv dataset, targeting scene-aware gesture generation from conversational context.
We will invite archival papers to be submitted as part of the ECCV Workshop proceedings. The topics covered in the workshop include, but are not limited to:
Scene-conditioned human motion and behaviour generation
Vision-language-motion alignment and referential grounding
VLA models for scene-aware motion and manipulation
Human–object and human–scene interaction datasets, benchmarks, and evaluation metrics
We also welcome contributions on affordance learning, embodied data collection (motion capture, VR/AR, egocentric vision), physically-based simulation, and applications in character animation, robotics, and embodied communication.