Are the most frequent words the most useful? Investigating core vocabulary in reading

Abstract

High-frequency words are often assumed to be the most useful words for communication, as they provide the greatest coverage of texts. However, the relationship between text coverage and comprehension may not be straightforward – some words may provide more information than others. In this study, we explore alternative methods of defining core vocabulary in addition to word frequency (e.g., words that are central hubs in semantic association networks). We report on the results of an empirical test of communicative utility using a text-based guessing game. We show that core words that reflect corpus-based distributional statistics (like frequency or co-occurrence centrality) were less useful for communication than others. This was evident both in terms of the size of the vocabulary that must be known and the proportion of the text that must be covered for successful communication.

Publication
In L Samuelson and S Frank and M Toneva and A Mackey and E Hazeltine (Eds.) Proceedings of the 46th Annual Conference of the Cognitive Science Society: 1257-1263

Related