# FLIP
**Repository Path**: guopingpan/FLIP
## Basic Information
- **Project Name**: FLIP
- **Description**: No description available
- **Primary Language**: Python
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-10-28
- **Last Updated**: 2025-10-28
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# FLIP: Flow-Centric Generative Planning for General-Purpose Manipulation Tasks
Official Code Repository for **FLIP: Flow-Centric Generative Planning for General-Purpose Manipulation Tasks**.
[Chongkai Gao](https://chongkaigao.com/)1, [Haozhuo Zhang](https://haozhuo-zhang.github.io/)2, [Zhixuan Xu](https://ariszxxu.github.io/)1, Zhehao Cai1, [Lin Shao](https://linsats.github.io/)1
1National University of Singapore, 2Peking University
In this paper, we present FLIP, a model-based planning algorithm on visual space that features three key modules: 1. a multi-modal flow generation model as the general-purpose action proposal module; 2. a flow-conditioned video generation model as the dynamics module; and 3. a vision-language representation learning model as the value module. Given an initial image and language instruction as the goal, FLIP can progressively search for long-horizon flow and video plans that maximize the discounted return to accomplish the task. FLIP is able to synthesize long-horizon plans across objects, robots, and tasks with image flows as the general action representation, and the dense flow information also provides rich guidance for long-horizon video generation. In addition, the synthesized flow and video plans can guide the training of low-level control policies for robot execution.
## Installaltion
### 1. Create Python Environment
```
conda create -n flip python==3.8
conda activate flip
```
### 2. Install Dependencies
```
pip install -r requirements.txt
```
### 3. Download CoTracker V2 Checkpoint
```
cd flip/co_tracker
wget https://huggingface.co/facebook/cotracker/resolve/main/cotracker2.pth
```
### 4. Download Meta Llama 3.1 8B
1. Get the download access from https://huggingface.co/meta-llama/Llama-3.1-8B.
2. Put the downloaded folder at `./`. You should have a file structure like this:
```
...
- liv
- llama_models
- Meta-Llama-3.1-8B
- consolidated.00.pth
- params.json
- tokenizer.model
- scripts
...
```
### 5. Download LIV Pretrained Models
1. Download the `model.pt` and `config.yaml` accroding to `https://github.com/penn-pal-lab/LIV/blob/main/liv/__init__.py#L33`.
2. `mkdir liv/resnet50`.
3. Put the `model.pt` and `config.yaml` under `liv/resnet50`. You should have a file structure like this:
```
...
- liv
- cfgs
- dataset
- examples
- models
- resnet50
- config.yaml
- model.pt
- utils
__init__.py
train_liv.py
trainer.py
- llama_models
...
```
## Data Preparation
### 1. Download the LIBERO-LONG Dataset
1. `wget https://utexas.box.com/shared/static/cv73j8zschq8auh9npzt876fdc1akvmk.zip`
2. `mkdir data/libero_10`
3. unzip and put the 10 LIBERO-10 hdf5 files into `data/libero_10`
### 2. Replay
`python scripts/replay_libero_data_from_hdf5.py`
By default, the resolution is 128 $\times$ 128.
### 3. Flow Tracking
`python scripts/video_tracking.py`.
By default, we only track the agentview demos. You may change the `eye_in_hand` to `true` in the `config/libero_10/tracking.yaml` to track the eye_in_hand demos.
### 4. Data Preprocessing
`python scripts/preprocess_data_to_hdf5.py`.
By default, we only preprocess the agentview demos. You may change the `eye_in_hand` to `true` in the `config/libero_10/preprocess.yaml` to preprocess the eye_in_hand demos.
## Training
### 1. Train the Flow Generation Model (Action Module)
`torchrun --nnodes=1 --nproc_per_node=2 scripts/train_cvae.py`
You can change `config/libero_10/cvae.yaml` for custom training. Current config is for A100 40G GPUs.
### 2. Train the Video Generation Model (Dynamics Module)
`torchrun --nnodes=1 --nproc_per_node=2 scripts/train_dynamics.py`
You can change `config/libero_10/dynamics.yaml` for custom training. Current config is for A100 40G GPUs.
### 3. Finetune the LIV Model with Video Clips (Value Module)
`python scripts/finetune_liv.py`
This script will first make a liv dataset and then train on it.
You may change the configs in `config/libero_10/finetune_liv.yaml`, `liv/cfgs/dataset/libero_10.yaml`, and `liv/cfgs/training/finetune.yaml` according to your own tasks.
### 4. Finetune the Pretrained VAE Encoder (for the Dynamics Module)
`torchrun --nnodes=1 --nproc_per_node=8 scripts/finetune_vae.py`
You can change `config/libero_10/finetune_vae.yaml` for custom training.
## Testing
1. `makedir models/libero_10`
2. put all the trained models (agentview_dynamics.pt, cvae.pt, finetuned_vae.pt, reward.pt) under `models/libero_10`
3. `torchrun scripts/hill_climbing.py`
## Separate Testing of Action Module and Dynamics Module
1. Action Module: `python scripts/eval_cvae.py`.
2. Dynamics Module: `scripts/train_dynamics.py`.
## Citation
If you find our codes or models useful in your work, please cite [our paper](https://nus-lins-lab.github.io/flipweb/):
```
TODO
```
## Contact
If you have any questions, feel free to contact me through email ([gaochongkai@u.nus.edu](gaochongkai@u.nus.edu))!