Domenico LacavallaData Scientist - ML Engineer

I am a Data Scientist on a mission to use artificial intelligence to create solutions with tangible, positive impact. My passion lies at the intersection of cutting-edge research and hands-on application, where innovative ideas become robust and scalable systems.

At IBM, I work as a Data Scientist AI Associate, where I design and deploy end-to-end solutions in NLP, Computer Vision, and for LLMs for enterprise clients. My work focuses on transforming massive datasets into actionable intelligence: I’ve engineered pipelines that analyze over 4 million conversations using fine-tuned BERT models, and process more than 10 million assets with OCR, NER, and LLMs. I’m also deeply involved in performance optimization—reducing ETL query times from 20 hours to 1 hour using Spark on IBM Cloud, and initiating HPC strategies to accelerate processing tasks.

In parallel, I actively pursue research. I was a Google Summer of Code 2024 contributor with HumanAI, where I explored the evolution of language in Dark Web communities. Outside of work, I dedicate my free time to reproducing and extending recent NLP research papers, often tweaking methodologies or applying them to novel contexts. This hobby helps me stay close to the academic frontier and continuously refine my understanding of language models and representation learning.

This blend of industry experience and independent research reflects my drive to bridge theory and practice. Academically, I hold a Bachelor’s degree in Computer Science (110/110 with honors) and I’m currently pursuing a Master’s in Data Science, which I balance with my full-time role at IBM.

My long-term goal is to pursue a PhD in Natural Language Processing, where I can deepen my expertise and contribute to solving complex language problems—transforming research into real-world impact.

Beyond my professional and academic pursuits, I have a passion for boxing, which teaches discipline and mental resilience, and I enjoy the intellectual challenge of solving Rubik’s cubes and other logic puzzles.

Tech Stack

Python, PyTorch, Transformers, Hugging Face, SQL, IBM Cloud, Spark, Scikit-learn, Docker, OpenCV, OCR, SLURM