grounding

A large, longitudinal audiovisual dataset recorded from the infant’s perspective

I put a camera on my toddler's head and this is the result

Visual and affective multimodal models of word meaning in language and mind

Demonstrates that models that incorporate visual and affective information capture human representations better than models built from text corpora