Friday, October 11 • 3:40pm - 5:30pm
This paper uses statistical word embeddings, namely Word2vec (Mikolov et al., 2013; Goldberg and Levy, 2014), to study the extension of estar in three Spanish-speaking communities. Two of these are sub-corpora of PRESEEA (2014-), comprised of 97 interviews from Spain (760,929 words) and 69 interviews from Mexico (597,916 words). The third corpus is of bilingual Spanish in Southern Arizona, CESA (Carvalho, 2012-), and is comprised of 76 interviews (498,711 words). Based on word embeddings extracted from these corpora, distances between target lexical items (e.g., adjectives) and all forms of ser and estar were calculated, which were then used to measure estar preference (i.e., distance to ser minus distance to estar) for each word. Results confirm some of the previous findings (Bessett, 2015; Cortés-Torres, 2004; Geeslin and Guijarro-Fuentes, 2008; Salazar, 2007; Silva-Corvalán, 1986), showing significant difference in the extension of estar across the three corpora. 


