Musa Dildar Ahmed Cheema

Professional Data Scientist

I am a highly accomplished data science professional with a proven track record of delivering innovative, impactful solutions. I lead cutting-edge AI/ML projects and have earned recognition as a distinguished judge at premier competitions.

Career Aspiration: To spearhead transformative initiatives that redefine the future of Data Science and Artificial Intelligence.

Contact: mcheema2010@gmail.com

Profile Picture

Experience

AVIRSO

03/2025 – Present

Tech Business Management Consultant

  • Providing tech business management consulting as a personal venture.
  • Working on strategic initiatives for a Fortune 10 company.
Tools: Business Strategy, Technology Consulting, Project Management

Teradata

01/2025 – 04/2025

Professional Data Scientist

  • Worked on AI ASK SQL Model training and SQL generation.
  • Developed Agentic inDB Workflow Generation.
  • Benchmarked Teradata VectorStore and generated inDB ONNX Embeddings.
  • Implemented inDB Intent Classification and Multi-Entity Tagging.
Tools: Teradata, SQLMR, C++, Python, Cloud Platforms

Teradata

01/2024 – 12/2024

Graduate Associate Data Science

  • Outstanding performance on the Wells Fargo project led to double promotion.
  • Developed a SQLMR function to tokenize input text into meaningful n-grams, enhancing sentiment analysis, topic identification, and document classification.
  • Engineered a function for precise text tagging and improved NLP outcomes.
  • Developed TD_GenAI functions for advanced text analytics within Teradata.
  • Implemented an in-database BYO-LLM solution and a system to convert unstructured data into JSON.
  • Initiated projects for in-database training (BYOM-LLM) and a parallel Torch NN-like library for ANN/CNN inference.
  • Worked on a Document QA system and defect detection (Wafer in DB) using ML and extensive feature engineering.
Tools: C, C++, Java, Regex; vLLM, C++, Python, Huggingface, Teradata Vantage, AWS, VCL, BYOM, PyTorch, ML, UDFs

My Impact Meter

02/2022 – 09/2023

Data Scientist (Part-time)

  • Developed data visualization dashboards for the admin portal to bolster data-driven decision-making.
  • Created an automated data pipeline for the Payment & Services division, reducing manual processing time by 80%.
Tools: Dashboarding Tools, Data Pipeline Automation

Data Insight Lab

06/2022 – 06/2023

Research Assistant

  • Developed ViLanOCR – a state-of-the-art method for extracting handwritten bilingual text from medical prescriptions.
  • Addressed challenges in extracting handwritten Urdu by leveraging advanced ML techniques.
Tools: Python, Machine Learning, OCR, Deep Learning

Knowledge Discovery & Data Science Lab

03/2021 – 08/2022

Research Assistant (Part-time)

  • Contributed to a NCAI-funded project to re-design e-recruitment using AI for temporal analysis.
  • Developed a novel resume ranking algorithm and optimized real-time entity enrichment from resumes.
Tools: AI, Machine Learning, Graph Databases, Optimization

Education

NUCES FAST ISB

Aug 2019 - Jun 2023

BS Data Science

Coursework: AI, NLP, Big Data, Distributed Data Engineering, DevOps/MLOps, and more.

Stanford University

Jun 2022 - Aug 2022

Summer School

Courses: CS229 Machine Learning, SOC-128D Social Data Science, etc.

Publications

An efficient algorithm for ranking candidates in e-recruitment system

Abdul Hanan Minhas, Mohammad Daniyal Shaiq, Saad Ali Qureshi, Musa Cheema, Shujaat Hussain, Kifayat Ullah Khan

A comprehensive approach that enhances candidate ranking in e-recruitment through advanced algorithms and graph-based techniques.

Read More

Feature-Wise Ranking of Candidates through Maximum Degrees in Hidden Bipartite Graphs

Sarah Kiyani, Musa Cheema, Saad Ali Qureshi, Shujaat Hussain, Kifayat Ullah Khan

Innovative graph-based techniques to improve candidate ranking accuracy in recruitment systems.

Read More

Transformer based Urdu Handwritten Text Optical Character Reader

Mohammad Daniyal Shaiq, Musa Dildar Ahmed Cheema, Ali Kamal

A pioneering OCR approach for handwritten Urdu text leveraging transformer architectures.

Read More

Adapting multilingual vision language transformers for low-resource Urdu optical character recognition (OCR)

Musa Dildar Ahmed Cheema, Mohammad Daniyal Shaiq, Farhaan Mirza, Ali Kamal, M. Asif Naeem​

This research introduces ViLanOCR—an innovative bilingual OCR system tailored for Urdu and English. Leveraging advanced multilingual transformer-based models, the approach achieves state-of-the-art performance on the Urdu UHWR dataset with a CER of 1.1%, surpassing existing baselines.

Read More

Judgeship

COMPPEC NUST EME 2024

Data Viz NASCON 2024

Code Craft NASCON 2024

Code Craft NASCON 2023

Data Quest NASCON 2023

Awards

Teradata Quaterly Award Q3

Teradata Project Team Award Q3

Teradata Quaterly Award Q2

Teradata Spot Award Jul 2024

Teradata Spot Award Apr 2024

Teradata Spot Award Dec 2023

Zindagi Awards Best FYP 2023

Bronze Award

Dean's List x3

Volunteer & Leadership

Google Student Club Vice Head App Dev 2020, NASCOM Arrangements, and other leadership roles.

Projects

ViLanOCR

A multilingual OCR system for handwritten text achieving a 1.2% CER on handwritten Urdu.

TransformersViTHuggingFace

BART/LLama 7B SFT Trainer

A trainer implementation for LLMs on a single GPU using PEFT and Hugging Face’s SFT Trainer.

LLMPEFTHuggingFace

LLama2.C

Chat UI and socket server for LLama2 inference in C, enabling real-time CPU-based processing.

LLamaCWebSocket