LABORATORY ON LINGUISTIC DATA ANALYSIS
Stampa
Enrollment year
2017/2018
Academic year
2017/2018
Regulations
DM270
Academic discipline
L-LIN/01 (GLOTTOLOGY AND LINGUISTICS)
Department
DEPARTMENT OF HUMANITIES
Course
THEORETICAL AND APPLIED LINGUISTICS; LINGUISTICS AND MODERN LANGUAGES
Curriculum
PERCORSO COMUNE
Year of study
Period
(26/02/2018 - 01/06/2018)
ECTS
6
Lesson hours
36 lesson hours
Language
Italian
Activity type
ORAL TEST
Teacher
JEZEK ELISABETTA (titolare) - 6 ECTS
Prerequisites
Familiarity with basic notion in general linguistics, particularly morphology, syntax, semantics and pragmatics, as they are offered in the three-year Bachelor's degrees in Humanities.
Learning outcomes
The aim of the course is to provide the students with the knowledge and skills needed to examine linguistic data from a variety of perspectives, through the use of digital resources such as corpora, lexicons, concordance tools, databases, knowledge bases, datasets, and ontologies. At the end of the course the students will be able to autonomously design and perform a linguistic analysis using methodologies primarily based on manual or semiautomatic annotation of data, with the goal of extracting or verifying linguistic generalizations for theoretical or applied purposes.
Course contents
The course focuses on two types of linguistic data:

- interactive corpora (social media networks, forums, blogs)
- news corpora, editorials

With the aid of the selected readings, we examine the creation, annotation, and structure of these corpora and their use for linguistic analysis and computational applications.
Teaching methods
Interactive lectures
Slides
Seminars with group presentations of the readings and discussion
Reccomended or required readings
Baldwin T., Cook P., Lui M., MacKinlay A. and L. Wang. 2013. "How Noisy Social Media Text, How Diffrnt Social Media Sources?" In Proceedings of the International Joint Conference on Natural Language Processing, pages 356–364, Nagoya, Japan, 14-18 October 2013.

Bender, E.M., Morgan, J.T., Oxley, M., Zachry, M., Hutchinson, B., Marin, A., Zhang, B. and M. Ostendorf. 2011. "Annotating social acts: Authority claims and alignment moves in wikipedia talk pages." In Proceedings of the Workshop on Languages in Social Media, pp. 48-57. Association for Computational Linguistics.

Celli F., Riccardi G. and F. Alam. 2016. "Multilevel annotation of agreement and disagreement in italian news blogs". In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portoroz, Slovenia.

Celli, F., Stepanov, E. A., Poesio, M., & Riccardi, G. (2016, December). Predicting Brexit: Classifying agreement is better than sentiment and pollsters. In Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (pp. 110-118).

Herdağdelen, A. and M. Marelli 2016. "Social Media and Language Processing: How Facebook and Twitter Provide the Best Frequency Estimates for Studying Word Recognition". In Cognitive Science, pp. 1-20. http://onlinelibrary.wiley.com/doi/10.1111/cogs.12392/full

Mohammad, S. M., Kiritchenko S., Sobhani P., Zhu X., and C. Cherry. 2016 "A dataset for detecting stance in tweets." In Proceedings of 10th edition of the the Language Resources and Evaluation Conference (LREC 2016), Portoroz, Slovenia.

Oraby, S., Reed L., Compton R., Riloff E., Walker M. and S. Whittaker. 2015 "And That’s A Fact: Distinguishing Factual and Emotional Argumentation in Online Dialogue." In Proceedings of the 2nd Workshop on Argumentation Mining, pp. 116-126.

Vlachos, A., & Riedel, S. (2014). Fact Checking: Task definition and dataset construction. ACL 2014, 18.

Zubiaga, A., Liakata, M., Procter, R., Hoi, G.W.S. and P. Tolmie. 2016. "Analysing how people orient to and spread rumours in social media by looking at conversational threads". In PloS one, 11(3), pp. 1-29.
Assessment methods
Final oral exam covering the material from the entire course.
Final assignment (8 pages) reporting the results of an in-depth corpus-based analysis of a linguistic phenomenon previously agreed during office hours. The text in pdf format must be sent to jezek@unipv.it 7 days before the exam.
Further information
Material for the course - including the updated list of readings, the slides of the lectures, links to available datasets, instructions for the final assignment - are available on the KIRO platform (access with personal username and password).
Sustainable development goals - Agenda 2030