MSc in Computer science (Università degli Studi di Milano)

This course introduces the principal techniques related to the analysis of large amounts of data.


Date Info
06/06/2016 Office hours canceled
Starting Monday 06/13, regular office hour are canceled until next semester; students can arrange an appointment via e-mail.
01/05/2016 Office hours canceled for two weeks
Office hours in the next two weeks are canceled; students can contact the teacher via e-mail.
16/02/2016 Office hours of February 22nd canceled
The office hours of February 22nd are canceled; students can arrange an appointment via e-mail.
23/11/2015 AWS credit
Students of the big scale analytics course can get promotional codes for AWS registering at the AWS educate program and specifying «Università degli Studi di Milano» as their institution. Additional credits can be obtained through the GitHub education program.
05/10/2015 Suspension of the Big scale analytics lesson
The Big scale analytics class of October, 7th is canceled.
01/10/2015 Office hours for the fall semester
Office hours for the fall semester will be during Monday on 14:30 in the teacher's office, starting from 5/10.


Lectures are in italian.

Course schedule

Lectures will take place at the Computer science department, according to the following tentative schedule:

Day Hour Place
Monday 14:30 - 16:30 aula 5
Wednesday 14:30 - 16:30 aula 6

Any change to the schedule will be announced in class and published in paragraph News of this page.

Office hours

By appointment, room 5015 of the Computer Science Department. It is possible contact the teacher by e-mail, taking care to read in advance the guide prepared by Prof. Sebastiano Vigna and clearly specifying in the message the course name and the academic year. In particular, students are encouraged to always use their academic address (i.e. based on the domain signing with name and student ID number and recalling that the response time may vary depending on the teacher commitments.

Course material

The theoric part of the course is based on the following textbook: Anand Rajaraman and Jeff Ullman, Mining of Massive Datasets, available both as a freely downloadable PDF and published in hardcopy by Cambridge University Press (ISBN:9781107015357). The suggested readings for the practical part are Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia, Learning Spark. Lightning-Fast Big Data Analysis, O'Reilly, 2015 (ISBN:978-1-449-35862-4) and Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills, Advanced Analytics with Spark. Patterns for Learning from Data at Scale, O'Reilly, 2015 (ISBN:978-1-491-91276-8)

The part on distributed file systems and MapReduce is based on the adopted textbook and on the Hadoop tutorial published by Yahoo!


The course explains the topics listed in the lecture calendar, covering the textbook contents as well as the contents of the remaining documents listed in Course material.

Lectures calendar


Exam modalities

The exam consists in an oral test, by appointment.