Design and fine-tune large-scale VLMs/LLMs for tasks such as visual reasoning, retrieval, 3D understanding, and embodied interaction.
Must-have
Nice-to-have
Not specified