Course contents
The use of annotated corpora for linguistic analysis
In this course we learn how to conduct corpus-based linguistic analysis, using methodologies and tools developed within corpus linguistics and computational linguistics. The student will be asked to set up and carry out an in-depth corpus-based analysis of a chosen linguistic phenomenon.
1. What counts as an annotated corpus.
2. Corpus query systems and methods.
3. Annotation and mark-up.
4. Regular expressions.
5. Frequency and association measures.
6. Types of corpora.
7. Examples of linguistic analyses and generalizations based on empirical evidendence.
Reccomended or required readings
Types of corpora
Lenci A., Montemagni S., Pirrelli V. 2005. Testo e Computer, Roma: Carocci, Cap. 1: “I dati della lingua”.
Annotation e mark-up
Lenci A., Montemagni S., Pirrelli V. 2005. Testo e Computer, Roma: Carocci, Cap. 8: “L’annotazione linguistica del testo”.
Corpus query tools and regular expressions
Kilgarriff A., Rychly, P., Smrž, P. Tugwell,D. 2004. "The Sketch Engine". In Williams G. and S. Vessier (eds.), Proceedings of the XI Euralex International Congress, July 6-10, 2004, Lorient, France. 105-111.
Corpus (introduction)
Sinclair J. 2005. “Corpus and Text - Basic Principles”. In Wynne M. (ed.) Developing Linguistic Corpora: a Guide to Good Practice, Oxford: Oxbow Books: 1-16.
Corpus (web corpora)
Baroni M. and A. Kilgarriff. 2006. “Large Linguistically-Processed Web Corpora for Multiple Languages”. In Proceedings of EACL 2006 (European Association for Computational Linguistics), 87-90.
Annotazione (introduction)
Leech G. 2005. “Adding Linguistic Annotation”. In Wynne M. (ed.) Developing Linguistic Corpora: a Guide to Good Practice, Oxford: Oxbow Books: 17-29.
Argument structure, frames, pattern
Jezek E. 2012. On the notion of Frame and Frame relation. ms. Dipartimento di Studi Umanistici, Università di Pavia, http://studiumanistici.unipv.it/?pagina=docenti&id=135.
Frame semantics
Fillmore, Charles J. 1982. “Frame semantics”. In The Linguistic Society of Korea (Ed.), Linguistics in the Morning Calm. Seoul: Hanshin Publishing Co, 111-137.
Corpus Pattern Analysis
Hanks, Patrick 1996. “Contextual Dependency and Lexical Sets”. In International Journal of Corpus Linguistics 1 (1).
Annotation: Framenet Annotation Guidelines
Ruppenhofer J., Ellsweorth M., Petruk M. R. L., Johnson C. R., Scheffczyk J. (2010) FrameNet? II: Extended Theory and Practice, International Computer Science Institute, University of Berkeley, chapter 3.
Annotation: VerbNet? Annotation Guidelines
http://verbs.colorado.edu/verb-index/VerbNet_Guidelines.pdf
Annotation: Corpus Pattern Analysis Validation Manual
http://nlp.fi.muni.cz/projects/cpa/CPA_valiman.pdf