Text-driven Motion Synthesis and Interaction Generation using Masked Deconstructed Diffusion and Multi-task Scene-aware Models
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This thesis introduces a new generative AI approach that addresses three long-standing hurdles in human motion generation: accuracy, speed, and reliable alignment with user-written text. From a simple sentence, the system quickly produces natural, high-quality 3D movements that can be retargeted to digital characters for animation, virtual reality (VR), and games. The experiment demonstrates its practical value in VR, where the generated motions enhance immersion and responsiveness. Building on this, the thesis explores a second, scene-aware model that works with large language models to understand both the instruction and the surrounding scene. It can break down long requests into smaller steps and generate motions that interact with objects, for example walking to a chair and then sitting down. Together, these contributions point to more intuitive, text-driven tools for creating lifelike character animation.