A large, longitudinal audiovisual dataset recorded from the infant’s perspective

Jessica Sullivan, Michelle Mei, Andrew Perfors, Erica Wojcik, Michael Frank

June 2021

PDF Dataset Project Project PsyArXiv OSF Published

Abstract

We introduce a new resource: the SAYCam corpus. Infants aged 6–32 months wore a head-mounted camera for approximately 2 hr per week, over the course of approximately two-and-a-half years. The result is a large, naturalistic, longitudinal dataset of infant- and child-perspective videos. Over 200,000 words of naturalistic speech have already been transcribed. Similarly, the dataset is searchable using a number of criteria (e.g., age of participant, location, setting, objects present). The resulting dataset will be of broad use to psychologists, linguists, and computer scientists.

Type

Journal article

Publication

Open Mind 5: 20-29

A large, longitudinal audiovisual dataset recorded from the infant’s perspective

Abstract

Related