Period: Second semester

Course unit contents: 

Part 1) Data Management
- Introducing data structures
- Exploring storage models
- Storage reliability data preservation
- Understanding security in data management
- Scalability principles for storage
- Comparing local and distributed file systems
- Examining database management principles
- Managing and retrieving data from Relational databases (MySQL)

Part 2) Data processing
- Basics of computing processing and limitations of single-threaded CPUs
- Introduction to threading and parallel processing techniques
- Overview of basic parallelization patterns in Python
- Understanding distributed computing systems
- Hadoop as a paradigm for big data processing
- Implementing data processing with Apache Spark
- Employing Dask for data processing tasks
- Understanding Apache Kafka as a distributed streaming platform


For the hands-on sessions on both parts:

- Basics of containerization methodologies (Docker)

Planned learning activities and teaching methods: 

Frontal lectures for the introductory topics. Examples and usecases
Hands-on sessions with live-coding examples run by the lecturers.
Exercises and examples to be done in the IT lab.

Modifié le: mercredi 28 août 2024, 09:33