2018-19 2017-18 2016-17 2015-16 2014-15 2013-14 2012-13

This course introduces the principal techniques related to the analysis of large amounts of data.

## Information

Date Info
13/03/2018 Office hours canceled
Starting wednesday 03/14, regular office hour are canceled; students can arrange an appointment via e-mail.
09/01/2018 Big scale analytics test of Jaunary 2018
The Big scale analytics test of January will take place on 15/1 starting from 9:30 in the professor's office.
20/04/2017 Office hours of Jan 9th canceled
The office hour of Jan 9th are canceled; students can arrange an appointment via e-mail.
16/12/2017 Schedule change for the Big scale analytics course
The December, 19th class of the Big scale analytics course is canceled. It will be catched up on Jan 8th, 2018.
07/12/2017 Schedule change for the Big scale analytics course
The December, 11th class of the Big scale analytics course will take place on Dec 12th.
29/11/2017 Schedule change for the Big scale analytics course
The forthcoming classes of the Big scale analytics course will take place on Dec 4th, Dec 11th, Dec 18th and Dec 19th.
02/11/2017 Schedule change for the Big scale analytics course
The 27/11 class of the Big scale analytics course will take place in aula 6 at 14:30, and the class of 28/11 will take place at 13:30 in aula alfa.
04/10/2017 Docker container for the Big scale analytics course
Students of the Big scale analytics course can download a ZIP archive containing all files needed to build and run the docker container used during lectures.

## Language

Lectures are in italian.

## Course schedule

Lectures will take place at the Computer science department, according to the following tentative schedule:

Day Hour Place
Monday 14:30 - 16:30 aula Delta
Tuesday 14:30 - 16:30 aula Omega

Any change to the schedule will be announced in class and published in paragraph News of this page.

## Office hours

Thursday, at 17:00 (online: https://meet.jit.si/ricevimento-malchiodi). It is possible contact the teacher by e-mail, taking care to read in advance the guide prepared by Prof. Sebastiano Vigna and clearly specifying in the message the course name and the academic year. In particular, students are encouraged to always use their academic address (i.e. based on the domain studenti.unimi.it) signing with name and student ID number and recalling that the response time may vary depending on the teacher commitments.

### Course material

The theoric part of the course is based on the following textbook: Anand Rajaraman and Jeff Ullman, Mining of Massive Datasets (marked by RU in the lectures calendar)., available both as a freely downloadable PDF and published in hardcopy by Cambridge University Press (ISBN:9781107015357). The suggested readings for the practical part are Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia, Learning Spark. Lightning-Fast Big Data Analysis, O'Reilly, 2015 (ISBN:978-1-449-35862-4) and Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills, Advanced Analytics with Spark. Patterns for Learning from Data at Scale, O'Reilly, 2015 (ISBN:978-1-491-91276-8)

Some labs refer to the Data Science and Engineering with Spark edX program.

## Syllabus

The course explains the topics listed in the lecture calendar, covering the textbook contents as well as the contents of the remaining documents listed in Course material.