Project Summary

The ERC Advanced Grant Language Evolution: The Empirical Turn will be funded by the European Research Council for a period of five years, starting April 2013. The project pursues a highly interdisciplinary approach to the empirical study of cultural language evolution. It draws on ideas and methods from historical linguistics and typology, natural language processing, biology, bioinformatics, computer science, and statistics.

The computer aided study of cultural language evolution has seen a tremendous upturn over the past fifteen years. This comprises both model-driven approaches - studying the consequences of design assumptions regarding language production, comprehension, and learning for their long-term population-wide consequences - and data-driven approaches that employ algorithmic techniques from bioinformatics to recover otherwise inaccessible information about language history. At the current junction, the field faces two challenges:

  • The specifics of language evolution - which includes parallels with but also key differences to biological evolution - require central attention.
  • Model-driven and data-driven approaches need to inform each other to achieve explanatory power and to assess the statistical significance of the findings.

The project will establish a radically data-oriented framework for the study of language evolution. This includes three aspects:

  • replacing the off-the-shelf tools from bioinformatics that are currently in use in computational language classification by linguistically informed algorithms, esp. multiple sequence alignment techniques,
  • identifying characteristic traits of language evolution via exploratory data analysis, guided by the theory of complex systems and employing cutting-edge methods from machine learning such as kernel methods and causal inference, and
  • developing, implementing and testing models of language evolution that correctly predict the statistical fingerprints of language evolution, i.e. pay sufficient attention to the domain specific features of language evolution that have no counterpart in biological evolution.

Working Packages

  • WP1: Data acquisition and maintenance
  • WP2: Sequence alignment and phylogeny reconstruction
  • WP3: Hidden Markov Models and reconstruction of ancestral forms
  • WP4: Kernel methods, conditional independence and causality
  • WP5: Computer simulations of language evolution
  • WP6: Statistical evaluation and visualization