# TTM **Repository Path**: monkeycc/TTM ## Basic Information - **Project Name**: TTM - **Description**: No description available - **Primary Language**: Python - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-12-03 - **Last Updated**: 2025-12-07 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

Time-to-Move

Training-Free Motion-Controlled Video Generation via Dual-Clock Denoising

Assaf Singer^† · Noam Rotstein^† · Amir Mann · Ron Kimmel · Or Litany

^† Equal contribution

Warped Ours Warped Ours

## Table of Contents - [Inference](#inference) - [Dual Clock Denoising Guide](#dual-clock-denoising) - [Wan](#wan) - [CogVideoX](#cogvideox) - [Stable Video Diffusion](#stable-video-diffusion) - [Generate Your Own Cut-and-Drag Examples](#generate-your-own-cut-and-drag-examples) - [GUI guide](GUIs/README.md) - [Community Adoption](#community-adoption) - [TODO](#todo) - [BibTeX](#bibtex) ## Inference **Time-to-Move (TTM)** is a plug-and-play technique that can be integrated into any image-to-video diffusion model. We provide implementations for **Wan 2.2**, **CogVideoX**, and **Stable Video Diffusion (SVD)**. As expected, the stronger the base model, the better the resulting videos. Adapting TTM to new models and pipelines is straightforward and can typically be done in just a few hours. We **recommend using Wan**, which generally produces higher‑quality results and adheres more faithfully to user‑provided motion signals. For each model, you can use the [included examples](./examples/) or create your own as described in [Generate Your Own Cut-and-Drag Examples](#generate-your-own-cut-and-drag-examples). ### Dual Clock Denoising TTM depends on two hyperparameters that start different regions at different noise depths. In practice, we do not pass `tweak` and `tstrong` as raw timesteps. Instead we pass `tweak-index` and `tstrong-index`, which indicate the iteration at which each denoising phase begins out of the total `num_inference_steps` (50 for all models). Constraints: `0 ≤ tweak-index ≤ tstrong-index ≤ num_inference_steps`. * **tweak-index** — when the denoising process **outside the mask** begins. - Too low: scene deformations, object duplication, or unintended camera motion. - Too high: regions outside the mask look static (e.g., non-moving backgrounds). * **tstrong-index** — when the denoising process **within the mask** begins. In our experience, this depends on mask size and mask quality. - Too low: object may drift from the intended path. - Too high: object may look rigid or over-constrained. ### Wan To set up the environment for running Wan 2.2, follow the installation instructions in the official [Wan 2.2 repository](https://github.com/Wan-Video/Wan2.2). Our implementation builds on the [🤗 Diffusers Wan I2V pipeline](https://github.com/huggingface/diffusers/blob/345864eb852b528fd1f4b6ad087fa06e0470006b/src/diffusers/pipelines/wan/pipeline_wan_i2v.py) adapted for TTM using the I2V 14B backbone. #### Run inference (using the included Wan examples): ```bash python run_wan.py \ --input-path "./examples/cutdrag_wan_Monkey" \ --output-path "./outputs/wan_monkey.mp4" \ --tweak-index 3 \ --tstrong-index 7 ``` #### Good starting points: * Cut-and-Drag: tweak-index=3, tstrong-index=7 * Camera control: tweak-index=2, tstrong-index=5

CogVideoX

To set up the environment for running CogVideoX, follow the installation instructions in the official [CogVideoX repository](https://github.com/zai-org/CogVideo). Our implementation builds on the [🤗 Diffusers CogVideoX I2V pipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/cogvideo/pipeline_cogvideox_image2video.py), which we adapt for Time-to-Move (TTM) using the CogVideoX-I2V 5B backbone. #### Run inference (on the included 49-frame CogVideoX example): ```bash python run_cog.py \ --input-path "./examples/cutdrag_cog_Monkey" \ --output-path "./outputs/cog_monkey.mp4" \ --tweak-index 4 \ --tstrong-index 9 ```

Stable Video Diffusion

To set up the environment for running SVD, follow the installation instructions in the official [SVD repository](https://github.com/Stability-AI/generative-models). Our implementation builds on the [🤗 Diffusers SVD I2V pipeline](https://github.com/huggingface/diffusers/blob/8abc7aeb715c0149ee0a9982b2d608ce97f55215/src/diffusers/pipelines/stable_video_diffusion/pipeline_stable_video_diffusion.py#L147 ), which we adapt for Time-to-Move (TTM). #### To run inference (on the included 21-frame SVD example): ```bash python run_svd.py \ --input-path "./examples/cutdrag_svd_Fish" \ --output-path "./outputs/svd_fish.mp4" \ --tweak-index 16 \ --tstrong-index 21 ```

## Generate Your Own Cut-and-Drag Examples We provide an easy-to-use GUI for creating cut-and-drag examples that can later be used for video generation in **Time-to-Move**. We recommend reading the [GUI guide](GUIs/README.md) before using it.

Cut-and-Drag GUI Example

To get started quickly, create a new environment and run: ```bash pip install PySide6 opencv-python numpy imageio imageio-ffmpeg python GUIs/cut_and_drag.py ```
## Community Adoption - [ComfyUI – WanVideoWrapper](https://github.com/kijai/ComfyUI-WanVideoWrapper) by [@kijai](https://github.com/kijai): native TTM nodes and an example Wan 2.2 I2V workflow. - [Wan 2.2 Time-To-Move ComfyUI Guide](https://www.youtube.com/watch?v=NcuUR7hrn-Q): YouTube tutorial by **Benji’s AI Playground**. - [ComfyUI – WanVideoWrapper spline editor](https://github.com/siraxe/ComfyUI-WanVideoWrapper_QQ) by [@siraxe](https://github.com/siraxe): keyframe-based editor and input-video assembly tool. If you are using TTM in your own project or product, feel free to open a PR to add it to this section. ### TODO 🛠️ - [x] Wan 2.2 run code - [x] CogVideoX run code - [x] SVD run code - [x] Cut-and-Drag examples - [x] Camera-control examples - [x] Cut-and-Drag GUI - [x] Cut-and-Drag GUI guide - [ ] Evaluation code ## BibTeX ``` @misc{singer2025timetomovetrainingfreemotioncontrolled, title={Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising}, author={Assaf Singer and Noam Rotstein and Amir Mann and Ron Kimmel and Or Litany}, year={2025}, eprint={2511.08633}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2511.08633}, } ```