# D2E **Repository Path**: monkeycc/D2E ## Basic Information - **Project Name**: D2E - **Description**: No description available - **Primary Language**: Python - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-12-03 - **Last Updated**: 2025-12-03 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI Suhwan Choi*, Jaeyoon Jung*, Haebin Seong*, Minchan Kim, Minyeong Kim, Yongjun Cho, Yoonshik Kim, Yubeen Park, Youngjae Yu‡, Yunsung Lee‡ [![project-page](https://img.shields.io/badge/Project%20Page-blue?style=flat-square)](https://worv-ai.github.io/d2e/) [![arXiv](https://img.shields.io/badge/arXiv-2410.01273-brightgreen.svg?style=flat-square)](https://arxiv.org/abs/2510.05684) image Code is comming soon! `ocap` and `owa` toolkit is being open-sourced already, have a look at these first. - https://github.com/open-world-agents/ocap: ocap (Omnimodal CAPture) captures all essential desktop signals in synchronized format. Records screen video, audio, keyboard/mouse input, and window events. - https://github.com/open-world-agents/open-world-agents: A versatile and efficient monorepo that embraces and grows multiple projects, containing all the essential building blocks for agent development. - https://worv-ai.github.io/d2e/: D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI. Code will coming soon! ## Citation If you find this work useful, please cite our paper: ``` @article{choi2025d2e, title={D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI}, author={Choi, Suwhan and Jung, Jaeyoon and Seong, Haebin and Kim, Minchan and Kim, Minyeong and Cho, Yongjun and Kim, Yoonshik and Park, Yubeen and Yu, Youngjae and Lee, Yunsung}, journal={arXiv preprint arXiv:2510.05684}, year={2025} } ```