LABORATORY ON LINGUISTIC DATA ANALYSIS

Enrollment year

2016/2017

Academic year

2016/2017

Regulations

DM270

Academic discipline

L-LIN/01 (GLOTTOLOGY AND LINGUISTICS)

Department

DEPARTMENT OF HUMANITIES

Course

THEORETICAL AND APPLIED LINGUISTICS; LINGUISTICS AND MODERN LANGUAGES

Curriculum

PERCORSO COMUNE

Year of study

1°

Period

(27/02/2017 - 01/06/2017)

ECTS

Lesson hours

36 lesson hours

Language

Italian

Activity type

ORAL TEST

Teacher

JEZEK ELISABETTA (titolare) - 6 ECTS

Prerequisites

Familiarity with basic notion in general linguistics.

Learning outcomes

The course aims to provide the students with the knowledge and skills needed to describe and analyze linguistic data from a variety of perspectives, through the use of digital resources such as corpora, lexicons, concordance tools, databases, knowledge bases, etc.

Course contents

The course focuses on two types of corpora:

- interactive corpora (social media networks, forums, blogs)
- news corpora, editorials

Through dedicated readings, we examine the creation, annotation, and structure of these corpora and their use for linguistic analysis and computational applications.

Teaching methods

Lectures
Slides
Lab
Meetings with teaching assistant

Reccomended or required readings

Baldwin T., Cook P., Lui M., MacKinlay A. and L. Wang. 2013. "How Noisy Social Media Text, How Diffrnt Social Media Sources?" In Proceedings of the International Joint Conference on Natural Language Processing, pages 356–364, Nagoya, Japan, 14-18 October 2013.

Bender, E.M., Morgan, J.T., Oxley, M., Zachry, M., Hutchinson, B., Marin, A., Zhang, B. and M. Ostendorf. 2011. "Annotating social acts: Authority claims and alignment moves in wikipedia talk pages." In Proceedings of the Workshop on Languages in Social Media, pp. 48-57. Association for Computational Linguistics.

Celli F., Riccardi G. and F. Alam. 2016. "Multilevel annotation of agreement and disagreement in italian news blogs". In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portoroz, Slovenia.

Herdağdelen, A. and M. Marelli 2016. "Social Media and Language Processing: How Facebook and Twitter Provide the Best Frequency Estimates for Studying Word Recognition". In Cognitive Science, pp. 1-20. http://onlinelibrary.wiley.com/doi/10.1111/cogs.12392/full

Mohammad, S. M., Kiritchenko S., Sobhani P., Zhu X., and C. Cherry. 2016 "A dataset for detecting stance in tweets." In Proceedings of 10th edition of the the Language Resources and Evaluation Conference (LREC 2016), Portoroz, Slovenia.

Oraby, S., Reed L., Compton R., Riloff E., Walker M. and S. Whittaker. 2015 "And That’s A Fact: Distinguishing Factual and Emotional Argumentation in Online Dialogue." In Proceedings of the 2nd Workshop on Argumentation Mining, pp. 116-126.

Zubiaga, A., Liakata, M., Procter, R., Hoi, G.W.S. and P. Tolmie. 2016. "Analysing how people orient to and spread rumours in social media by looking at conversational threads". In PloS one, 11(3), pp. 1-29.

Additional references to be discussed in the seminars will be provided in class and uploaded on Kiro at the beginning of the course.

Assessment methods

Final oral exam covering material from the entire course.
Final assignment (8 pages) reporting the results of an in-depth corpus-based analysis of a linguistic phenomenon previously agreed during office hours. The text in pdf format must be sent to jezek@unipv.it 7 days before the exam.

Further information

Sustainable development goals - Agenda 2030

$lbl_legenda_sviluppo_sostenibile