Postal address: Språkbanken, Institutionen för svenska språket, Göteborgs universitet, Box 200, SE-405 30 Göteborg
Visiting address: Lennart Torstenssonsgatan 8
Web page: http://spraakbanken.gu.se/eng/start
Contact person: Lars Borin, Professor of Natural Language Processing, Deputy Head of Department responsible for research-related matters, and Director of the Swedish Language Bank, phone: +46 (0)31 786 4544
Språkbanken (the Swedish Language Bank) is a nationally and internationally acknowledged research unit at the Department of Swedish, University of Gothenburg, established in 1975 in recognition of the groundbreaking corpus linguistic work initiated by Sture Allén. Our work focuses on language technology, in particular methodologies for handling the Swedish language, and the development of linguistic resources and tools for Swedish. These language resources are made available to researchers in language technology and other disciplines, as well as to the general public.
South Asia related research
On 30 October 2014, Lars Borin was granted a project grant from the Swedish Research Council (total amount SEK 7.34 m in five years, 2015-19) for linguistic research. The project is entitled South Asia as a linguistic area? Exploring big-data methods in areal and genetic linguistics, and will be carried out in collaboration with Professor Anju Saxena at the Division for Linguistics and Computer Lingustics (including Language Technology); Department of Linguistics and Philology, Uppsala University.
Project abstract: Linguistics is entering the age of big data and e-science. We are now at a point where it is possible to see how new research questions can be formulated – and old research questions addressed from a new angle or established results verified – on the basis of exhaustive collections of data, rather than small samples. In this project we will study an old linguistic research question using big data: the South Asian linguistic area hypothesis. South Asia (SA) is regularly mentioned in the literature as a classic linguistic area, but systematic investigations of this claim are lacking and the need for a more thorough study has been stressed repeatedly. This is the primary empirical objective. Grierson’s ”Linguistic Survey of India” (LSI; 1903-1927) still remains the most complete source on SA languages. Its 21 tomes (9.500 pages) cover 723 SA linguistic varieties. Comparable lexical and grammatical information is provided for 268 varieties representing the four major SA language families. Prof. Borin and Prof. Saxena will work with an extensive data set extracted from a digitized version of LSI and develop computational methods for conducting large-scale quantitative lexical and grammatical comparative studies in order to establish typological profiles of the four families. They will also complement these large-scale studies with an in-depth case study focusing on a particular language contact situation. This will give a firm empirical basis for evaluating the SA areal hypothesis