Learning outcomes
This introductory course teaches students how to perform linguistic, historical and literary analysis of texts using techniques and tools developed in the fields of Natural Language Processing and the Digital Humanities. At the end of the course the student will be able to plan and conduct the analysis of a linguistic, historical or literary phenomenon using methodologies and tools coming from corpus linguistics and computational linguistics.
Course contents
TEXTS AND COMPUTERS
1. Introduction.
2. Digitized texts.
3. Annotated texts.
4. Schemes and levels of text annotation.
5. Exploring texts: tools for querying text corpora.
6. Digital resources for linguistic, historical and literary analysis.
7. Overview and Lab on a selection of projects conducted with natural language processing techniques applied to the Digital Humanities (see compulsory literature).
Reccomended or required readings
Textbook
Lenci Alessandro, Montemagni Simonetta, Pirrelli Vito 2016. Testo e Computer, Roma: Carocci.
Chapters in volume
Jezek, Elisabetta. 2011. Lessico. Classi di parole, strutture, combinazioni. Bologna: Il Mulino. Cap. 1 "Nozioni di base" (pp. 13-46).
Pustejovsky J. and A. Stubbs. 2012. Natural Language Annotation for Machine Learning, O’Reilly. Cap. 1, "The Basics" (pp. 1-31), Cap. 6 "Evaluating the Annotations" (pp. 126-134) , Cap. 8 "Evaluating your Algorithm" (pp. 170-177).
Readings (available on the course website in the Kiro platform)
Bjerva, Johannes, and Raf Praet. 2015. "Word embeddings pointing the way for late antiquity." In Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), pp. 53-57.
Menini, Stefano, Rachele Sprugnoli, Giovanni Moretti, Enrico Bignotti, Sara Tonelli, and Bruno Lepri. 2017. "Ramble on: tracing movements of popular historical figures." In Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pp. 77-80.
Odijk, Jan. 2016. "Linguistic Research using CLARIN". In Lingua, 178, pp. 1-4.
Nguyen, Dong, McGillivray, Barbara and Taha Yasseri, 2018. "Emo, love and god: making sense of Urban Dictionary, a crowd-sourced online dictionary". Royal Society open science, 5(5).
Further readings will be indicated during the course and uploaded on the KIRO platform.
Assessment methods
Final oral exam covering material from the entire course.
Final assignment (5 pages) reporting the results of an in-depth corpus-based analysis of a linguistic, historical or literary phenomenon previously agreed during office hours. The text in pdf format must be sent to jezek@unipv.it 7 days before the exam.