字节跳动开源
A lightweight native unified multimodal model for image and video understanding, generation, and editing.
Recipes for the ConanCenter repository
An GUI Agent application based on UI-TARS(Vision-Lanuage Model) that allows you to control your computer using natural language.