BIG DATA ANALYSIS
Stampa
Anno immatricolazione
2018/2019
Anno offerta
2018/2019
Normativa
DM270
SSD
SECS-S/01 (STATISTICA)
Dipartimento
DIPARTIMENTO DI SCIENZE ECONOMICHE E AZIENDALI
Corso di studio
INTERNATIONAL BUSINESS AND ENTREPRENEURSHIP - MANAGEMENT INTERNAZIONALE E IMPRENDITORIALITÀ
Curriculum
Digital Management
Anno di corso
Periodo didattico
Primo Semestre (24/09/2018 - 21/12/2018)
Crediti
9
Ore
66 ore di attività frontale
Lingua insegnamento
English
Tipo esame
SCRITTO E ORALE CONGIUNTI
Docente
CERCHIELLO PAOLA (titolare) - 9 CFU
Prerequisiti
Knowledge of basic concepts of Statistics like inference, confidence interval, test of hypothesis, simple regression model.
For the coding part, it is recommended to have some familiarity with Python language, or similar software like R, even through on-line courses (ex. Coursera).
Obiettivi formativi
Knowledge of the most relevant statistical methods for large data set analysis. The student will learn how to run a complete work flow of analysis by employing Python software. From the collection and management of large data set, through the choice of the most appropriate models to the final interpretation and contextualization of the results.
Programma e contenuti
The aim of this course is to study and apply the most relevant statistical models in the analysis of large data set.
The perspective in mainly applicative: choosing and applying suitable models to exploit the whole informative content of (large) data set with a particular attention to the correct and contextualized interpretation of the final results. Moreover, a focus will be set on some frameworks for the management of large data set like MapReduce for data clustering.
The course will be held with the interactive employment of open source software like Python to learn practically the complete analysis work-flow.
A particular emphasis will be given to social network data, textual data, business-financial case studies.
Some of the models that will be covered are: Naive Bayes Classifier, Latent Dirichlet Analysis, Clustering algorithm, Penalized regression Support Vector Machines.
Metodi didattici
The class integrates theoretical lectures with practicals based on Python to learn how to implement and analyze the most appropriate models according to the available data.
A tutor will help students weekly in acquiring all the necessary theoretical and practical knowledge.
All the material used during the lectures (slides. script. data) will be available on Kiro platform.
Testi di riferimento
1) Coelho and Richert, Building Machine Learning Systems with Python, Second edition, Packt Publishing.
2)Introduction to Python for Econometrics, Statistics and Data Analysis
Kevin Sheppard, pdf version available
Modalità verifica apprendimento
Midterm about the first explained models.
Final project of analysis with real case studies.
Altre informazioni
Some lectured will be thought by experts of the big data field.