Tonga Junior Cedric •

Molecule Retrieval with Natural Language Queries
- Participated in a team challenge aimed at identifying molecules(represented as graphs) corresponding to given textual query. Our general approach comprises four blocks: text and molecule encoding, modality alignment, and retrieval using SciBERT, GTN, GPS and others models. By integrating various loss functions and exploring training strategies, We acheived a rank of 7 out of 52 teams
Model compression using knowledge distillation and quantization
- The project’s goal was to distill a Unet model for groove segmentation using knowledge distillation and quantization, implementing it manually without relying on PyTorch APIs.
Lymphocytosis classification :
- Participated in a team in the challenge focusing on binary patient classification (reactive or malignant), we adopted a multimodal strategy. This involved crafting attribute-based and ResNet-based image models with aggregation methods, alongside employing Multiple Instance Learning incorporating a custom aggregation inspired by focal loss. We finished 2nd out of 39 Teams.
STYLE-TRANS-FAIR : codabench challenge [A competition about bias in images]
- My classmates and I created a competition focused on bias in images. In this competition, the objective was to perform multi-task classification using a training dataset biased towards different artistic styles. Participants were tasked with developing robust systems capable of classifying images without inadvertently learning features specific to the style transfer bias. The training set was split so that each class had a dominant artistic style. Conversely, the test set was balanced to assess the extent to which participants could overcome the bias. For more information, check the Github repository.
- Here the Link to the competition and link to github repository
Spark project (see github repository)
- Demonstrated expertise in real-time data processing and analysis using Spark Streaming and a streaming API, with a focus on Reddit data. Developed a Python server for Reddit API integration, conducted sentiment analysis with TextBlob, and employed Spark Streaming for data analysis with machine learning algorithms and window operations. Proficiently visualized real-time insights. Experience includes comprehensive project management from data retrieval to analysis and result visualization.
Punctuation restoration project (see repo for more details, there will be presentation powerpoint there)
- The aim was to train models capable of restoring punctuation to unpunctuated text.
- the use of 2 articles(Automatic punctuation restoration with BERTmodels and Punctuation Restoration using Transformer Models for High-and Low-Resource Languages)to implement the SOTA in this area.
- use of Cbow and Bert as embedding layer and as classifier a bidirectional lstm, then a transformer and directional lstm.