Word prediction in context: An empirical investigation of core vocabulary

Andrew Wang, Simon De Deyne, Meredith McKague, Andrew Perfors

February 2023

PDF Project Supplemental

Abstract

Core vocabulary is a topic of huge interest in linguistics and has been studied from a wide variety of perspectives, such as language learning, dictionary studies, and cross-linguistically. In many of these conceptions, word frequency is widely considered the conventional measure of a word’s coreness; however, this approach overlooks important aspects of mental representation like centrality in an associative semantic network. In this experiment, we compare different approaches to defining core words in a task that involves predicting missing words in sentences. Results showed that core words (regardless of definition) were easier to guess than non-core words, but that frequency-defined ones did not perform as well as expected given their higher predictability and the nature of the task. Analysis of incorrect responses also showed that people preferred to guess core words, simple synonyms, and words that are taxonomically related to the target. The findings suggest that how core vocabulary is defined depends in part on the nature of the task and that aspects of both mental representation and the linguistic environment play an important role.

Type

Preprint

Publication

In M Goldwater and F Anggoro and B Hayes and D Ong (Eds.) Proceedings of the 45th Annual Conference of the Cognitive Science Society: Manuscript under review

Word prediction in context: An empirical investigation of core vocabulary

Abstract

Related