Collaborative Research on the Geography of English Dialect Features by Self-Organizing Maps
Jean-Claude Thill Principal Investigator
MetadataShow full item record
This collaborative project between a geographer and a linguist will provide a thorough spatio-linguistic analysis of the variations of word usage and pronunciation in the Middle and South Atlantic States. It is intended as a contribution to linguistic geography, empirical dialectology, and computational geography. Conventional methods of quantitative spatial analysis have been found to be ill-suited to analyze the Linguistic Atlas of Middle and South Atlantic States (LAMSAS) databases accumulated over several decades of extensive field work. Thus, the project will employ a neural network model, called Self-Organizing Maps (SOM), which has never been used to analyze linguistic data, and the study will be a test of the robustness of SOM on geographic problems. The following specific linguistic and dialectologic questions will be addressed: 1. How are linguistic features (whether words, grammatical constructions, pronunciations, or combinations of the latter) distributed over geographic areas, if not in relatively uniform patterns of complementary distribution? 2. Are the configuration and density of the road network critical elements of the creation and shaping of dialectal regions? 3. Since individual linguistic features can be shown to have specific distributions in geographic space, are there some relatively small group of linguistic features that can be perceived as most "salient" in the identification of regional differences? 4. What is the congruence between isoglosses posited by traditional, subjective methods for simple linguistic features (lexical, pronunciation, or grammatical) and the "fuzzy" multidimensional clusters derived from SOMs? 5. What is the interaction of personal/social variables such as sex, age, ethnicity, occupation, and education, with geographic variables such as location and community type (urban/rural)? The project is an empirical study of language variation across geographic areas and socio-demographic groups conducted on the largest digital database of language features in North America. Traditional dialectologists have assumed that there were such things as regional dialects and that they only needed to find diagnostic linguistic features to demonstrate the presence and boundaries of the dialect areas. The primary tool of traditional dialectologists was the isogloss, which was supposed to represent the geographical limit of occurrence of some linguistic feature; bundles of isoglosses that ran in about the same place were taken to be dialect boundaries. Recently, quantitative approaches have indicated that linguistic variations are considerably more complex. This project will rely on novel computational techniques of data compression to further our understanding of the mathematical underpinnings of language in the distribution of features across territory and social dimensions. By not starting with the assumption of the systematicity of language and dialect that generativists and traditional dialectologists and sociolinguists share (i.e. that all speakers of a language or dialect share essentially the same rules and inventories of linguistic features), this research has the potential to build a multidimensional cultural model in which language is not removed from society as an independent system but instead is well-integrated with other regional and social characteristics.