• Your selection is empty.

    Register the diplomas, courses or lessons of your choice.

Search, extraction and visualization

  • School / Prep

    ENSEIRB-MATMECA

Internal code

EI9IS329

Description

The aim of this course is twofold, and will be structured around two projects.

The first project aims to present and implement some techniques for extracting information from textual data.

We will first see how proven algorithms such as the Bag-of-Word model and TF-IDF can be used to extract relevant data from documents.
We will then look at vector embedding methods, studying the Word2Vec model, which can be used to extract contextual data.
Finally, we will see how this information can be used to identify semantically related texts or categorize them using clustering algorithms.

The second project will address a similar problem, but in the context of visual data.

The lectures will be accompanied by TDs/TPs enabling the effective implementation of the algorithms presented above.

Two projects, one on textual data and the other on visual data, based on real data, will enable students to apply the algorithms seen in lectures, while applying their skills in distributed computing to process the volumetry of the dataset in a reasonable time.

Read more

Teaching hours

  • TDTutorial21,33h

Mandatory prerequisites

Notions of Python, algorithms and linear algebra

Read more

Further information

The analysis and processing of natural language (NLP) is one of today's major challenges in Artificial Intelligence. Advances in this field are used on a daily basis in search engines, chatbots and mailboxes (spam detection, advertising targeting, etc.).
On the other hand, processing large numbers of images to extract information is another of today's major challenges. Automatic detection of people, signs, object recognition... the fields are many and varied.

Read more

Assessment of knowledge

Initial assessment / Main session - Tests

Type of assessmentType of testDuration (in minutes)Number of testsTest coefficientEliminatory mark in the testRemarks
ProjectContinuous control1