School / Prep
ENSMAC
ECTS
6 credits
Internal code
PC0BDATA
Description
The rapid development of new technologies such as microprocessors, storage systems, 5G, RFID chips and blockchain has paved the way for the collection of a multitude of data at unprecedented speed and volume. Faced with this abundance of information, its analysis or exploitation requires the use of specific methods: exploratory data analysis, machine learning algorithms with large language models (LLM) such as ChatGPT.
It is in this context that this course aims to make students aware of the importance and exploitation of massive datasets. The main part of this module is devoted to practical exercises and projects. By acquiring this knowledge, they will be able to play the key role of interface between chemists/biologists and data scientists, promoting optimal exploitation of the information contained in these data.
Teaching hours
- PRJProject20h
- PRACTICAL WORKPractical work22h
- CMLectures8h
Syllabus
Contents
General information on generating, manipulating and representing massive data sets (4 h)
- Acquire and organize: experimental design, rules/principles, formats, types of data, databases)
- Store, share, protect, archive: storage issues, sharing practices, version management, data security against piracy or accidental loss, using it ethically (RGPD, mass manipulation)
- Manipulate : extracting, transforming, cleaning, exploratory analysis
- Visualizing: reminder of the rules of data presentation and basic graphic representations (placing data in context, mistakes not to be made, finding a suitable representation), visualizations adapted for massive datasets (PCA, heatmaps, correlation matrices)
Data analysis project (20 h)
A group data analysis project on a real experimental dataset will be carried out. The main objective of this project will be to explore and analyze this dataset in order to answer the scientific questions posed by the experimenter. This hands-on experience will enable direct confrontation with a massive dataset, by putting manipulation, analysis and visualization methods into practice. The project will be evaluated through the submission of a script and an oral presentation. This approach will enable them to develop their skills in analyzing massive data, while tackling the real challenges encountered in scientific practice.
Blockchain technology (8 h)
Bitcoin was born in 2009, and regularly makes the headlines. But Bitcoin is just one of many applications of blockchain technology. The aim of this section is firstly to understand how Bitcoin works, to become aware of the paradigm shift that blockchain technology can bring about (notably by making it possible to share/certify data without the need for the usual "trusted third parties") and to reflect on the possible and possibly future consequences in the world of business, research, etc... The following sections will be covered:
- Basic principles of blockchain technology (CM, 2h)
- "TP Bitcoin" (TP, 4h)
- Blockchain ecosystem: presentation of various blockchain projects (oral presentation, 2h)
Deep learning algorithms ( 18 h)
Deep learning is a branch of machine learning that has undergone spectacular development in recent years. It is based on architectures of deep artificial neural networks, capable of learning complex data representations and performing tasks such as image recognition, machine translation, text generation, and so on. The aim of these practical exercises is to introduce the fundamental concepts and apply them to a range of different topics.
Assessment methods
Project and defense
Manager
Emilien Peltier
Further information
Companies, Trades and Cultures
Assessment of knowledge
Initial assessment / Main session - Tests
Type of assessment | Type of test | Duration (in minutes) | Number of tests | Test coefficient | Eliminatory mark in the test | Remarks |
---|---|---|---|---|---|---|
Continuous control | Skills assessment |
Second chance / Catch-up session - Tests
Type of assessment | Type of test | Duration (in minutes) | Number of tests | Test coefficient | Eliminatory mark in the test | Remarks |
---|---|---|---|---|---|---|
Continuous control | Skills assessment |