Text2Motion
Published:
Goal
To develop an end-to-end pipeline that automates 3D character animation, taking static meshes and generating text-aligned motion sequences through automatic rigging, skinning, and motion synthesis.
Technical Approach
The system is organized in three stages:
- Automatic Rigging & Skinning: A PointNet++ model predicts joint positions and skinning weights directly from 3D mesh geometry, eliminating the need for manual rigging.
- Text-Conditioned Motion Generation: A topology-aware transformer diffusion model generates motion sequences conditioned on natural language descriptions, using SBERT and T5 text encodings.
- Loss & Training: The model is trained with a combination of Geodesic loss and InfoNCE contrastive loss to ensure topologically consistent and semantically aligned motion.
Key Metrics & Results
- Achieved 92% armature classification accuracy on the Truebones Zoo dataset.
- The pipeline successfully automates the full workflow from static mesh to animated character driven by text descriptions.
- Published results in the “Text2Motion” article on Medium.
Tech Stack
- Python, PyTorch, Transformers, CUDA, OpenCV, SBERT, T5
