The objective of the course is to introduce the fundamental concepts at the basis of massive data management and analysis, including the main processing techniques dealing with data at massive scale and their implementation on distributed computational frameworks.

Expected results

At the end of the module, students shall know the main approaches enabling them to analyze massive amounts of data, as well as the ability to design and execute computations on big data, deployed on modern distributed computing systems.

News

Date Info
24/02/2021 Projects for the «Algorithms for massive datasets» course (Master DSE)
The description of the projects for the course «Algorithms for massive datasets» for the Master in «Data Science for Economics» are available.
19/02/2021 Tutoring «Algorithms for massive datasets» (DSE)
Starting from March 1st, students can attend five tutoring sessions held on the same days previously used for lectures (Monday and Tuesday, 15:30-17:00). Students will receive an e-email message containing the corresponding zoom link.
29/09/2020 Semester for the Algorithms for massive datasets (DSE) course
The class of Algorithms for massive datasets (DSE) will be delivered in the spring semester.

Language

Lectures are in English.

Office hours

By appointment (via e-mail). It is possible contact the teacher by e-mail, taking care to read in advance the guide prepared by Prof. Sebastiano Vigna and clearly specifying in the message the course name and the academic year. In particular, students are encouraged to always use their academic address (i.e. based on the domain studenti.unimi.it) signing with name and student ID number and recalling that the response time may vary depending on the teacher commitments.

Course material

Lectures are based:

• on the textbook Mining of Massive Datasets, written by A. Rajaraman and J. Ullman (marked by RU in the calendar of lectures), available as a free download in the authors' Web site and published in hardcopy by Cambridge University Press (ISBN:9781107015357);
• on the notes and sample code published in the calendar of lectures.

Syllabus

The course explains the topics listed in the lecture calendar (available at the beginning of the course), covering the textbook contents as well as the contents of the remaining documents listed in Course material.

Prereqs

The course requires knowledge of the main topics of bachelor-level computer programming, calculus, probability, and statistics.

Exam modalities

The exam consists of a project and an oral test, both related to the topics covered in the course. The project requires to process one or more datasets through the critical application of the techniques described during the classes, and is described in a written report. The evaluation of the project, expressed with a pass/fail mark, considers the level of mastery of the topics and the clarity of the report. The oral test, which is accessed after a positive evaluation of the project, is based on the discussion of some topics covered in the course and on in-depth questions about the presented project. The evaluation of the oral test, expressed on a scale between 0 and 30, takes into account the level of mastery of the topics, clarity, and language skills.

Session Date
June N/A
July N/A
September N/A
September N/A
January N/A
February N/A