Tonga Junior Cedric

EvidenceB, AI Research team,Paris, France(April 2024 – in progress)

NLP research engineer intern

  • Main question: How can we train small language models to generate guidance cues enabling learners (simulated here) to find solutions to mathematics exercises in the educational domain? These cues must adhere to a specified cognitive approach.

  • Keywords : Large Language Models, , personalization,hint generation, prompting, in-context learning, human and automated evaluation, finetuning and distillation in smaller LLMs, RLHF, synthetic data generation, PEFT.

CIRST-Université Quebec à Montréal, Montréal (June 2023-August 2023)

NLP research intern

  • Study of the robustness of monolingual and multilingual language models to certain linguistic structures with a view to taking account of the cultural diversity of people in distress in suicide prevention tools.
  • Training of the XLM-R, distiluse-base and CamemBERT-base models on a set of French sentence pairs using the simple contrastive learning of sentence embeddings(SimCSE) method.
  • Our results indicate that pre-trained multilingual sentence embedding models perform well, but that after training, monolingual models perform better than multilingual models.
  • Work presented at ACFAS Congress 2024 which took place in OTTAWA, canada.

LISN(Laboratoire Interdisciplinaire des Sciences du Numérique), Gif-sur-Yvette, France (january 2023-February 2023)

Research intern

  • Discovering the world of research by spending one day a week in the laboratory in the spoken language processing team (TLP).

  • use of the RFE algorithm from sickit-learn, decision trees to determine the most discriminating characteristics. cartographic representation of geographical areas according to the number of innovations in Latin.

Digital Research Center of Sfax, Sfax, Tunisia(February 2022 – May 2022)

NLP research intern, Brain4ICT Team

  • Design of a platform capable of analyzing the feelings of users using a Cameroonian dialect that is Francamglais
  • Reading of several research papers, collection of a dataset containing product reviews in Cameroonian dialect by scraping the web on YouTube through the implementation of a Python script, Cleaning Dataset, BERT Learning based Sentiment Analysis of Cameroonian Dialect, Deploy Model on Web Application by using FLASK. Bert’s fine tuning obtained an accuracy of 86%.
  • link to the demonstration video