These lectures provide the theoretical and practical bases for storing and effectively processing large volumes of data: collecting, retrieving, accessing Big Data.
We will first study how to analyze, organize and present Big Data in order to address their specific challenges: reduce the complexity, process the data deluge in real time, propose new paradigms to allow the extraction of relevant knowledge. The course will then introduce the state-of-art Big Data computing platforms with the focus on how to utilize them in processing (managing and analyzing) massive datasets. Specifically, we will discuss the Apache Hadoop MapReduce and Apache Spark frameworks, which provide the most accessible and practical means of computing with large datasets in the Cloud.
Lecturer: Alexandru Costan alexandru.costan@insa-rennes.fr