2018-19 2017-18 2016-17 2015-16 2014-15 2013-14 2012-13

This course introduces the principal techniques related to the analysis of large amounts of data.

## Information

Date Info
21/05/2015 Office hours of May 28th canceled
The office hours of May 28th are canceled; students can arrange an appointment via e-mail.
19/03/2015 Office hours of April 2nd canceled
The office hours of April 2nd are canceled; students can arrange an appointment via e-mail.
12/03/2015 Office hours rescheduled on 12/3
Office hours of today will start at 16:00.
05/03/2015 Office hours for the spring semester
Office hours for the spring semester will be on Thursday at 14:30 in the teacher's office, starting from 5/3.
09/01/2015 Office hours canceled
Regular office hour are canceled until next semester; students can arrange an appointment via e-mail.
25/11/2014 Office hours rescheduled on 25/11
Office hours of today will last until 15:00; students can arrange an appointment via e-mail.
13/11/2014 Tutorial about MapReduce in AWS
I published a tutorial about running JAR-encoded MapReduce jobs in AWS.
15/10/2014 AWS in Education grant for the Big scale analytics course
Students enrolled in the course of Big scale analytics may benefit by a grant of 100US$to access to the tools provided by Amazon Web Services. 09/10/2014 Tutorial about Hadoop installation I published an updated tutorial about installing Hadoop on a virtual machine. ## Language Lectures are in italian. ## Course schedule Lectures will take place at the Computer science department, according to the following tentative schedule: Day Hour Place Tuesday 16:30 - 18:30 aula 5 Thursday 15:30 - 17:30 aula 5 Any change to the schedule will be announced in class and published in paragraph News of this page. ## Office hours Thursday, at 17:00. It is possible contact the teacher by e-mail, taking care to read in advance the guide prepared by Prof. Sebastiano Vigna and clearly specifying in the message the course name and the academic year. In particular, students are encouraged to always use their academic address (i.e. based on the domain studenti.unimi.it) signing with name and student ID number and recalling that the response time may vary depending on the teacher commitments. ### Course material The course is based on the following textbook: Anand Rajaraman and Jeff Ullman, Mining of Massive Datasets, available both as a freely downloadable PDF and published in hardcopy by Cambridge University Press (ISBN:9781107015357) The part on distributed file systems and MapReduce is based on the adopted textbook and on the Hadoop tutorial published by Yahoo! Students enrolled in this course may benefit of a 100US$ credit to use the tools provided by Amazon Web Services.

The part on machine learning is described on the additional chapter of the textbook available online, in the third chapter of S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice Hall, 1999 (ISBN 0-13-908385-5) and in two online tutorials about classification and regression.

The part on dimensionality reduction is described on the additional chapter of the textbook available online.

## Syllabus

The course explains the topics listed in the lecture calendar, covering the textbook contents in chapters 1, 2 (until section 2.6.2 included), 3 (until section 3.7 included), 4 (until section 4.5 included), 5 (excluding sections 5.2.4 and 5.2.5), 6 (until section 6.5.1 included), 7 (until section 7.5 included), 8 (until section 8.4.6 included), 9 (until section 9.4 included), 10 (sections 10.1, 10.2, 10.4, and 10.5), 11 (until section 11.3 included) and 12 (until section 12.3 included), as well as the contents of the remaining documents listed in Course material.