• Your selection is empty.

    Register the diplomas, courses or lessons of your choice.

Big Data: volume, velocity, variety

  • School / Prep

    ENSMAC

  • ECTS

    6 credits

Internal code

PC0BDATA

Description

The rapid development of new technologies such as microprocessors, storage systems, 5G, RFID chips and blockchain has paved the way for the collection of a multitude of data at unprecedented speed and volume. Faced with this abundance of information, its analysis or exploitation requires the use of specific methods: exploratory data analysis, machine learning algorithms with large language models (LLM) such as ChatGPT.
It is in this context that this course aims to make students aware of the importance and exploitation of massive datasets. The main part of this module is devoted to practical exercises and projects. By acquiring this knowledge, they will be able to play the key role of interface between chemists/biologists and data scientists, promoting optimal exploitation of the information contained in these data.

Read more

Teaching hours

  • PRJProject20h
  • PRACTICAL WORKPractical work22h
  • CMLectures8h

Syllabus

Contents

General information on generating, manipulating and representing massive data sets (4 h)

    - Acquire and organize: experimental design, rules/principles, formats, types of data, databases)
- Store, share, protect, archive: storage issues, sharing practices, version management, data security against piracy or accidental loss, using it ethically (RGPD, mass manipulation)
- Manipulate : extracting, transforming, cleaning, exploratory analysis
- Visualizing: reminder of the rules of data presentation and basic graphic representations (placing data in context, mistakes not to be made, finding a suitable representation), visualizations adapted for massive datasets (PCA, heatmaps, correlation matrices)

Data analysis project (20 h)

A group data analysis project on a real experimental dataset will be carried out. The main objective of this project will be to explore and analyze this dataset in order to answer the scientific questions posed by the experimenter. This hands-on experience will enable direct confrontation with a massive dataset, by putting manipulation, analysis and visualization methods into practice. The project will be evaluated through the submission of a script and an oral presentation. This approach will enable them to develop their skills in analyzing massive data, while tackling the real challenges encountered in scientific practice.

Blockchain technology (8 h)

Bitcoin was born in 2009, and regularly makes the headlines. But Bitcoin is just one of many applications of blockchain technology. The aim of this section is firstly to understand how Bitcoin works, to become aware of the paradigm shift that blockchain technology can bring about (notably by making it possible to share/certify data without the need for the usual "trusted third parties") and to reflect on the possible and possibly future consequences in the world of business, research, etc... The following sections will be covered:
- Basic principles of blockchain technology (CM, 2h)
- "TP Bitcoin" (TP, 4h)
- Blockchain ecosystem: presentation of various blockchain projects (oral presentation, 2h)

Deep learning algorithms ( 18 h)

Deep learning is a branch of machine learning that has undergone spectacular development in recent years. It is based on architectures of deep artificial neural networks, capable of learning complex data representations and performing tasks such as image recognition, machine translation, text generation, and so on. The aim of these practical exercises is to introduce the fundamental concepts and apply them to a range of different topics.

Assessment methods

Project and defense

Manager

Emilien Peltier

Read more

Further information

Companies, Trades and Cultures

Read more

Assessment of knowledge

Initial assessment / Main session - Tests

Type of assessmentType of testDuration (in minutes)Number of testsTest coefficientEliminatory mark in the testRemarks
Continuous controlSkills assessment

Second chance / Catch-up session - Tests

Type of assessmentType of testDuration (in minutes)Number of testsTest coefficientEliminatory mark in the testRemarks
Continuous controlSkills assessment