Corpus Linguistics for Digital Humanities
Marijn Koolen & Maarten Marx
Language & Computation
Week One - 9.00-10.30 - Level: F
Digital analysis of large collections of human-created artefacts has become an additional tool in the Humanities, next to the traditional approach of hermeneutic interpretation of a small collection of carefully selected artefacts (``close reading''). A major difference is that close reading is done by humans, while ``distant reading'' is done by computers: humans do not read the primary artefacts anymore, but interpret the results obtained from computational analysis of corpora. In this course we take the magic away from distant reading. After the course, the student is able to
- value results obtained from computational research in the Humanities;
- use both close and distant reading as two complementary and reinforcing tools;
- operationalize a research question as a computer solvable task;
- design and implement a complete digital Humanities research pipeline.
Moreover, the student knows how to start a digital Humanities project, is more or less aware of her own level, and knows how to achieve a desired level of computer skills.
The course focuses on corpora consisting of texts with additional structure and metadata. Concretely, students learn how to create a corpus, how to represent text documents for computer processing, and how to compare, classify and categorize individual documents and corpora.
The course is very much hands-on: half of each session is devoted to practical work on private laptops. We use exciting real world data coming from ongoing digital Humanities research done at the University of Amsterdam.