თემა: 20 May - 25 May, Big data storage and processing (Alexandru Costan) | Advanced Topics in Computer Science

Section outline

These lectures provide the theoretical and practical bases for storing and effectively processing large volumes of data: collecting, retrieving, accessing Big Data.

We will first study how to analyze, organize and present Big Data in order to address their specific challenges: reduce the complexity, process the data deluge in real time, propose new paradigms to allow the extraction of relevant knowledge. The course will then introduce the state-of-art Big Data computing platforms with the focus on how to utilize them in processing (managing and analyzing) massive datasets. Specifically, we will discuss the Apache Hadoop MapReduce and Apache Spark frameworks, which provide the most accessible and practical means of computing with large datasets in the Cloud.

Big Data overview (definition, characteristics)

Storage models for Cloud (Binary Large Objects: Amazon S3, Azure Blobs), NoSQL(Google BigTable, Cassandra), disk storage (GoogleFS, HDFS, PVFS, Lustre), in-memory storage (key-value stores, hybrid systems: memecached, mongoDB)

Batch vs. stream processing

Consistency models

Big Data processing models: MapReduce

Big Data platforms: Apache Hadoop, Apache Spark

Lecturer: Alexandru Costan alexandru.costan@insa-rennes.fr
- Select activity Lectures slides
  
  Lectures slides დირექტორია
- Select activity Introduction to Java Threads (useful for the practical sessions)
  
  Introduction to Java Threads (useful for the practical sessions) URL
- Select activity Practical Sessions
  
  Practical Sessions დირექტორია