Data Scientist AI Associate
Published in IBM, 2023
- Utilized generative AI and fine-tuned BERT to analyze 4M conversations, identify trends, apply clustering, and present results to stakeholders.
- Built and deployed customer segmentation models handling 6M data points, integrating a full training pipeline with automated retraining.
- Engineered production AI pipeline leveraging OCR/OMR, NER, LLMs and Speech-to-Text for 10M+ assets, optimizing digitization to minute-scale.
- Applied ResNet (86% score) & CLIP (93% score) to evaluate fidelity of OMR output (Audiveris) against original digitized scores.
- Initiated HPC (H100 GPUs) exploration and parallelization strategy using SLURM aiming to halve processing time for core digitization primitives.
- Optimized complex ETL query execution from 20 hours to 1 hour by refactoring SQL and leveraging Spark on IBM Cloud with multithreading.