The objective of the course is to introduce the fundamental concepts at the basis of massive data management and analysis, including the main processing techniques dealing with data at massive scale and their implementation on distributed computational frameworks.

## Expected results

At the end of the module, students shall know the main approaches enabling them to analyze massive amounts of data, as well as the ability to design and execute computations on big data, deployed on modern distributed computing systems.

## Language

Lectures are in English.

## Course schedule

Lectures take place at the Città studi neighborhood according to the following schedule:

Day Hour Place
Monday 12:30 - 14:30 Aula Alfa

A different schedule is applied in the following dates:

• 17/01: lecture in classroom Beta;
• 25/01: lecture on Tuesday replacing the one of 24/01;
• 22/02: lecture on Tuesday replacing the one of 21/01;
• 23/02, 12:30-14:30: additional lecture in classroom 303;
• 28/02: lecture in classroom MA (Via Mangiagalli 31);
• 07/03: lecture in classroom MA (Via Mangiagalli 31).

Any change to the schedule will be announced in class and published in paragraph News of this page.

### Online lectures

Until further notice, lectures will also be streamed through authentication to a zoom link published on the ariel page of the course.

Please note that the purpose of streaming is to encourage «[...] the participation of students with particular frailties, or who are immunosuppressed, or not yet in possession of the COVID-19 green certification, as well as of international students unable to attend in presence because of travel limitations caused by the epidemiological emergency» (Rector's decree on teaching activities, August, 23rd). In all other cases, attendance in the classroom is strongly recommended, unless there are no more places available.

## Office hours

Thursday, 17:00.

### Course material

Lectures are based:

• on the textbook Mining of Massive Datasets, written by A. Rajaraman and J. Ullman (marked by RU in the calendar of lectures), available as a free download in the authors' Web site and published in hardcopy by Cambridge University Press (ISBN:9781107015357);
• on the notes and sample code published in the calendar of lectures.

## Syllabus

The course explains the topics listed in the lecture calendar (available at the beginning of the course), covering the textbook contents as well as the contents of the remaining documents listed in Course material.

### Prereqs

The course requires knowledge of the main topics of bachelor-level computer programming, calculus, probability, and statistics.

## Exam modalities

The exam of the module consists of a project and an oral test, both related to the topics covered in the course. The project requires to process one or more datasets through the critical application of the techniques described during the classes, and is described in a written report. The evaluation of the project, expressed with a pass/fail mark, considers the level of mastery of the topics and the clarity of the report. The oral test, which is accessed after a positive evaluation of the project, is based on the discussion of some topics covered in the course and on in-depth questions about the presented project. The evaluation of the oral test, expressed on a scale between 0 and 30, takes into account the level of mastery of the topics, clarity, and language skills. Students should refer to the Web pages of the remaining modules for the description of the corresponding exam modalities.

##### Written exam (prof. Ardagna, prof. Foresti)

##### Project and oral exam (prof. Malchiodi)

## Exam sessions

Session Date
March sign up date: 31/03, project deadline: 27/03 31/03, oral exams: 01/04 06/04
June sign up date: 14/06, project deadline: 12/06, oral exams: 15/06
July sign up date: 07/07, project deadline: 03/07, oral exams: 07/07
September sign up date: 08/09, project deadline: 18/09, oral exams: 20/09
December N/A
January N/A