This course introduces the principal techniques related to the analysis of large amounts of data.
Lectures will take place at the Computer science department, according to the following tentative schedule:
|Monday||14:30 - 16:30||aula Delta|
|Tuesday||14:30 - 16:30||aula Omega|
Any change to the schedule will be announced in class and published in paragraph News of this page.
It is possible contact the teacher by e-mail, taking care to read in advance the guide prepared by Prof. Sebastiano Vigna and clearly specifying in the message the course name and the academic year. In particular, students are encouraged to always use their academic address (i.e. based on the domain
studenti.unimi.it) signing with name and student ID number and recalling that the response time may vary depending on the teacher commitments.
The theoric part of the course is based on the following textbook: Anand Rajaraman and Jeff Ullman, Mining of Massive Datasets, available both as a freely downloadable PDF and published in hardcopy by Cambridge University Press (ISBN:9781107015357). The suggested readings for the practical part are Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia, Learning Spark. Lightning-Fast Big Data Analysis, O'Reilly, 2015 (ISBN:978-1-449-35862-4) and Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills, Advanced Analytics with Spark. Patterns for Learning from Data at Scale, O'Reilly, 2015 (ISBN:978-1-491-91276-8)
The part on distributed file systems and MapReduce is based on the adopted textbook and on the Hadoop tutorial published by Yahoo!
Some labs refer to the Data Science and Engineering with Spark edX program.
The exam consists in an oral test, by appointment.