This role sits at the center of research efforts to shape training objectives and architectures for joint image, video, and audio foundation models.
Must-have
Nice-to-have
Not specified