Musa Dildar Ahmed Cheema
Professional Data Scientist
I am a highly accomplished data science professional with a proven track record of delivering innovative, impactful solutions. I lead cutting-edge AI/ML projects and have earned recognition as a distinguished judge at premier competitions.
Career Aspiration: To spearhead transformative initiatives that redefine the future of Data Science and Artificial Intelligence.
Contact: mcheema2010@gmail.com

Experience
AVIRSO
03/2025 – PresentTech Business Management Consultant
- Providing tech business management consulting as a personal venture.
- Working on strategic initiatives for a Fortune 10 company.
Teradata
01/2025 – 04/2025Professional Data Scientist
- Worked on AI ASK SQL Model training and SQL generation.
- Developed Agentic inDB Workflow Generation.
- Benchmarked Teradata VectorStore and generated inDB ONNX Embeddings.
- Implemented inDB Intent Classification and Multi-Entity Tagging.
Teradata
01/2024 – 12/2024Graduate Associate Data Science
- Outstanding performance on the Wells Fargo project led to double promotion.
- Developed a SQLMR function to tokenize input text into meaningful n-grams, enhancing sentiment analysis, topic identification, and document classification.
- Engineered a function for precise text tagging and improved NLP outcomes.
- Developed TD_GenAI functions for advanced text analytics within Teradata.
- Implemented an in-database BYO-LLM solution and a system to convert unstructured data into JSON.
- Initiated projects for in-database training (BYOM-LLM) and a parallel Torch NN-like library for ANN/CNN inference.
- Worked on a Document QA system and defect detection (Wafer in DB) using ML and extensive feature engineering.
My Impact Meter
02/2022 – 09/2023Data Scientist (Part-time)
- Developed data visualization dashboards for the admin portal to bolster data-driven decision-making.
- Created an automated data pipeline for the Payment & Services division, reducing manual processing time by 80%.
Data Insight Lab
06/2022 – 06/2023Research Assistant
- Developed ViLanOCR – a state-of-the-art method for extracting handwritten bilingual text from medical prescriptions.
- Addressed challenges in extracting handwritten Urdu by leveraging advanced ML techniques.
Knowledge Discovery & Data Science Lab
03/2021 – 08/2022Research Assistant (Part-time)
- Contributed to a NCAI-funded project to re-design e-recruitment using AI for temporal analysis.
- Developed a novel resume ranking algorithm and optimized real-time entity enrichment from resumes.
Education
NUCES FAST ISB
Aug 2019 - Jun 2023BS Data Science
Coursework: AI, NLP, Big Data, Distributed Data Engineering, DevOps/MLOps, and more.
Stanford University
Jun 2022 - Aug 2022Summer School
Courses: CS229 Machine Learning, SOC-128D Social Data Science, etc.
Publications
An efficient algorithm for ranking candidates in e-recruitment system
Abdul Hanan Minhas, Mohammad Daniyal Shaiq, Saad Ali Qureshi, Musa Cheema, Shujaat Hussain, Kifayat Ullah Khan
A comprehensive approach that enhances candidate ranking in e-recruitment through advanced algorithms and graph-based techniques.
Read MoreFeature-Wise Ranking of Candidates through Maximum Degrees in Hidden Bipartite Graphs
Sarah Kiyani, Musa Cheema, Saad Ali Qureshi, Shujaat Hussain, Kifayat Ullah Khan
Innovative graph-based techniques to improve candidate ranking accuracy in recruitment systems.
Read MoreTransformer based Urdu Handwritten Text Optical Character Reader
Mohammad Daniyal Shaiq, Musa Dildar Ahmed Cheema, Ali Kamal
A pioneering OCR approach for handwritten Urdu text leveraging transformer architectures.
Read MoreAdapting multilingual vision language transformers for low-resource Urdu optical character recognition (OCR)
Musa Dildar Ahmed Cheema, Mohammad Daniyal Shaiq, Farhaan Mirza, Ali Kamal, M. Asif Naeem
This research introduces ViLanOCR—an innovative bilingual OCR system tailored for Urdu and English. Leveraging advanced multilingual transformer-based models, the approach achieves state-of-the-art performance on the Urdu UHWR dataset with a CER of 1.1%, surpassing existing baselines.
Read MoreJudgeship
COMPPEC NUST EME 2024
Data Viz NASCON 2024
Code Craft NASCON 2024
Code Craft NASCON 2023
Data Quest NASCON 2023
Awards
Teradata Quaterly Award Q3
Teradata Project Team Award Q3
Teradata Quaterly Award Q2
Teradata Spot Award Jul 2024
Teradata Spot Award Apr 2024
Teradata Spot Award Dec 2023
Zindagi Awards Best FYP 2023
Bronze Award
Dean's List x3
Volunteer & Leadership
Google Student Club Vice Head App Dev 2020, NASCOM Arrangements, and other leadership roles.
Projects
ViLanOCR
A multilingual OCR system for handwritten text achieving a 1.2% CER on handwritten Urdu.
BART/LLama 7B SFT Trainer
A trainer implementation for LLMs on a single GPU using PEFT and Hugging Face’s SFT Trainer.
LLama2.C
Chat UI and socket server for LLama2 inference in C, enabling real-time CPU-based processing.