Publications
This page provides a curated list of my academic publications. It may include journal articles, conference papers, workshop papers, and preprint.
For the most up-to-date list, including citations and recent additions, please visit my Google Scholar profile.
2025
- ACL 2025Commonsense Reasoning in Arab CultureAbdelrahman Sadallah, Junior Cedric Tonga, Khalid Almubarak, Saeed Almheiri, Farah Atif, Chatrine Qwaider, Karima Kadaoui, Sara Shatnawi, Yaser Alesh, and Fajri KotoIn Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul 2025
Despite progress in Arabic large language models, such as Jais and AceGPT, their evaluation on commonsense reasoning has largely relied on machine-translated datasets, which lack cultural depth and may introduce Anglocentric biases. Commonsense reasoning is shaped by geographical and cultural contexts, and existing English datasets fail to capture the diversity of the Arab world. To address this, we introduce , a commonsense reasoning dataset in Modern Standard Arabic (MSA), covering cultures of 13 countries across the Gulf, Levant, North Africa, and the Nile Valley. The dataset was built from scratch by engaging native speakers to write and validate culturally relevant questions for their respective countries. spans 12 daily life domains with 54 fine-grained subtopics, reflecting various aspects of social norms, traditions, and everyday experiences. Zero-shot evaluations show that open-weight language models with up to 32B parameters struggle to comprehend diverse Arab cultures, with performance varying across regions. These findings highlight the need for more culturally aware models and datasets tailored to the Arabic-speaking world.
@inproceedings{sadallah-etal-2025-commonsense, title = {Commonsense Reasoning in {A}rab Culture}, author = {Sadallah, Abdelrahman and Tonga, Junior Cedric and Almubarak, Khalid and Almheiri, Saeed and Atif, Farah and Qwaider, Chatrine and Kadaoui, Karima and Shatnawi, Sara and Alesh, Yaser and Koto, Fajri}, editor = {Che, Wanxiang and Nabende, Joyce and Shutova, Ekaterina and Pilehvar, Mohammad Taher}, booktitle = {Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)}, month = jul, year = {2025}, address = {Vienna, Austria}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2025.acl-long.380/}, pages = {7695--7710}, isbn = {979-8-89176-251-0}, }
- Under reviewSimulating LLM-to-LLM Tutoring for Multilingual Math FeedbackJunior Cedric Tonga, KV Aditya Srivatsa, Kaushal Kumar Maurya, Fajri Koto, and Ekaterina KochmarJul 2025
Large language models (LLMs) have demonstrated the ability to generate formative feedback and instructional hints in English, making them increasingly relevant for AI-assisted education. However, their ability to provide effective instructional support across different languages, especially for mathematically grounded reasoning tasks, remains largely unexamined. In this work, we present the first large-scale simulation of multilingual tutor-student interactions using LLMs. A stronger model plays the role of the tutor, generating feedback in the form of hints, while a weaker model simulates the student. We explore 352 experimental settings across 11 typologically diverse languages, four state-of-the-art LLMs, and multiple prompting strategies to assess whether language-specific feedback leads to measurable learning gains. Our study examines how student input language, teacher feedback language, model choice, and language resource level jointly influence performance. Results show that multilingual hints can significantly improve learning outcomes, particularly in low-resource languages when feedback is aligned with the student’s native language. These findings offer practical insights for developing multilingual, LLM-based educational tools that are both effective and inclusive.
@misc{tonga2025simulatingllmtollmtutoringmultilingual, title = {Simulating LLM-to-LLM Tutoring for Multilingual Math Feedback}, author = {Tonga, Junior Cedric and Srivatsa, KV Aditya and Maurya, Kaushal Kumar and Koto, Fajri and Kochmar, Ekaterina}, year = {2025}, eprint = {2506.04920}, archiveprefix = {arXiv}, primaryclass = {cs.CL}, }
2024
- NeurIPS 2024
FM-Assess workshopAutomatic Generation of Question Hints for Mathematics Problems using Large Language Models in Educational TechnologyJunior Cedric Tonga, Benjamin Clement, and Pierre-Yves OudeyerIn Proceedings of Large Foundation Models for Educational Assessment, 15–16 dec 2024The automatic generation of hints by Large Language Models (LLMs) within Intelligent Tutoring Systems (ITSs) has shown potential to enhance student learning. However, generating pedagogically sound hints that address student misconceptions and adhere to specific educational objectives remains challenging. This work explores using LLMs (GPT-4o and Llama-3-8B-instruct) as teachers to generate effective hints for students simulated through LLMs (GPT-3.5-turbo, Llama-3-8B-Instruct, or Mistral-7B-instruct-v0.3) tackling math exercises designed for human high-school students, and designed using cognitive science principles. We present here the study of several dimensions: 1) identifying error patterns made by simulated students on secondary-level math exercises; 2) developing various prompts for GPT-4o as a teacher and evaluating their effectiveness in generating hints that enable simulated students to self-correct; and 3) testing the best-performing prompts, based on their ability to produce relevant hints and facilitate error correction, with Llama-3-8B-Instruct as the teacher, allowing for a performance comparison with GPT-4o. The results show that model errors increase with higher temperature settings. Notably, when hints are generated by GPT-4o, the most effective prompts include prompts tailored to specific errors as well as prompts providing general hints based on common mathematical errors. Interestingly, Llama-3-8B-Instruct as a teacher showed better overall performance than GPT-4o. Also the problem-solving and response revision capabilities of the LLMs as students, particularly GPT-3.5-turbo, improved significantly after receiving hints, especially at lower temperature settings. However, models like Mistral-7B-Instruct demonstrated a decline in performance as the temperature increased. This study advances our understanding of the potential and limitations of LLMs in educational contexts, towards integrating these models into pedagogically grounded.
@inproceedings{pmlr-v264-tonga25a, title = {Automatic Generation of Question Hints for Mathematics Problems using Large Language Models in Educational Technology}, author = {Tonga, Junior Cedric and Clement, Benjamin and Oudeyer, Pierre-Yves}, booktitle = {Proceedings of Large Foundation Models for Educational Assessment}, pages = {61--102}, year = {2024}, editor = {Li, Sheng and Cui, Zhongmin and Lu, Jiasen and Harris, Deborah and Jing, Shumin}, volume = {264}, series = {Proceedings of Machine Learning Research}, month = {15--16 Dec}, publisher = {PMLR}, url = {https://proceedings.mlr.press/v264/tonga25a.html}, }
- IWCMC 2024AfriDial: African Dialect Model based on Deep Learning for Sentiment AnalysisAmeni Sassi, Junior Tonga, Stéphanie Poaty, Sanon Steve, Djibrine Idriss Abakar Adjid, Moukhtar Cherif, and Wael OuardaIn 2024 International Wireless Communications and Mobile Computing (IWCMC), 15–16 dec 2024
This paper presents the African Dialect Dataset for Sentiment Analysis, a new natural language processing dataset (AfriDial). This dataset is intended to aid in the classification of multilingual human text using the mother tongue. Around 14k documents in seven distinct dialects, including Tunisian, Moroccan, Chadian, Mauritanian, Burkina Faso, Cameroonian, and Congolese, are included in the AfriDial dataset. The documents, which cover a wide range of subjects like politics, sports, entertainment, and technology, were gathered from open social media and crowdsourcing. Positive, negative, and neutral sentiments are the three classes assigned to each document in the dataset. The AfriDial dataset will be an important tool for researchers working on multilingual text classification and natural language processing (NLP). The paper also presents a baseline model using the transfer learning of bidirectional encoder representations from transformers (BERT) architecture on the AfriDial dataset. An experimental study is presented to introduce more methods and contributions to the field of dialectal NLP
@inproceedings{10592398, author = {Sassi, Ameni and Tonga, Junior and Poaty, Stéphanie and Steve, Sanon and Adjid, Djibrine Idriss Abakar and Cherif, Moukhtar and Ouarda, Wael}, booktitle = {2024 International Wireless Communications and Mobile Computing (IWCMC)}, title = {AfriDial: African Dialect Model based on Deep Learning for Sentiment Analysis}, year = {2024}, volume = {}, number = {}, pages = {1248-1254}, keywords = {Wireless communication;Sentiment analysis;Analytical models;Tongue;Social networking (online);Transfer learning;Text categorization;dataset;dialect;sentiment analysis;NLP;Africa}, doi = {10.1109/IWCMC61514.2024.10592398}, }