Tonga Junior Cedric •

EvidenceB, AI Research team,Paris, France(April 2024 – in progress)

NLP research engineer intern

Main question: How can we train small language models to generate guidance cues enabling learners (simulated here) to find solutions to mathematics exercises in the educational domain? These cues must adhere to a specified cognitive approach.
Keywords : Large Language Models, , personalization,hint generation, prompting, in-context learning, human and automated evaluation, finetuning and distillation in smaller LLMs, RLHF, synthetic data generation, PEFT.

CIRST-Université Quebec à Montréal, Montréal (June 2023-August 2023)

NLP research intern

Study of the robustness of monolingual and multilingual language models to certain linguistic structures with a view to taking account of the cultural diversity of people in distress in suicide prevention tools.
Training of the XLM-R, distiluse-base and CamemBERT-base models on a set of French sentence pairs using the simple contrastive learning of sentence embeddings(SimCSE) method.
Our results indicate that pre-trained multilingual sentence embedding models perform well, but that after training, monolingual models perform better than multilingual models.
Work presented at ACFAS Congress 2024 which took place in OTTAWA, canada.

LISN(Laboratoire Interdisciplinaire des Sciences du Numérique), Gif-sur-Yvette, France (january 2023-February 2023)

Research intern

Discovering the world of research by spending one day a week in the laboratory in the spoken language processing team (TLP).
use of the RFE algorithm from sickit-learn, decision trees to determine the most discriminating characteristics. cartographic representation of geographical areas according to the number of innovations in Latin.

Digital Research Center of Sfax, Sfax, Tunisia(February 2022 – May 2022)

NLP research intern, Brain4ICT Team

Design of a platform capable of analyzing the feelings of users using a Cameroonian dialect that is Francamglais
Reading of several research papers, collection of a dataset containing product reviews in Cameroonian dialect by scraping the web on YouTube through the implementation of a Python script, Cleaning Dataset, BERT Learning based Sentiment Analysis of Cameroonian Dialect, Deploy Model on Web Application by using FLASK. Bert’s fine tuning obtained an accuracy of 86%.
link to the demonstration video