This new machine-learning model can match corresponding audio and visual data, which could someday help robots interact in the real world. “We are building AI systems that can process the world like humans do, in terms of having both audio and visual information...
