EvidenceB, AI Research team,Paris, France(April 2024 – in progress)
NLP research engineer intern
Main question: How can we train small language models to generate guidance cues enabling learners (simulated here) to find solutions to mathematics exercises in the educational domain? These cues must adhere to a specified cognitive approach.
Keywords : Large Language Models, , personalization,hint generation, prompting, in-context learning, human and automated evaluation, finetuning and distillation in smaller LLMs, RLHF, synthetic data generation, PEFT.
CIRST-Université Quebec à Montréal, Montréal (June 2023-August 2023)
NLP research intern
- Study of the robustness of monolingual and multilingual language models to certain linguistic structures with a view to taking account of the cultural diversity of people in distress in suicide prevention tools.
- Training of the XLM-R, distiluse-base and CamemBERT-base models on a set of French sentence pairs using the simple contrastive learning of sentence embeddings(SimCSE) method.
- Our results indicate that pre-trained multilingual sentence embedding models perform well, but that after training, monolingual models perform better than multilingual models.
- Work presented at ACFAS Congress 2024 which took place in OTTAWA, canada.
LISN(Laboratoire Interdisciplinaire des Sciences du Numérique), Gif-sur-Yvette, France (january 2023-February 2023)
Research intern
Discovering the world of research by spending one day a week in the laboratory in the spoken language processing team (TLP).
use of the RFE algorithm from sickit-learn, decision trees to determine the most discriminating characteristics. cartographic representation of geographical areas according to the number of innovations in Latin.
Digital Research Center of Sfax, Sfax, Tunisia(February 2022 – May 2022)
NLP research intern, Brain4ICT Team
- Design of a platform capable of analyzing the feelings of users using a Cameroonian dialect that is Francamglais
- Reading of several research papers, collection of a dataset containing product reviews in Cameroonian dialect by scraping the web on YouTube through the implementation of a Python script, Cleaning Dataset, BERT Learning based Sentiment Analysis of Cameroonian Dialect, Deploy Model on Web Application by using FLASK. Bert’s fine tuning obtained an accuracy of 86%.
- link to the demonstration video