Musa Dildar Ahmed Cheema

LinkedIn

Data Scientist

 • AI/ML

• DevOps/MLOps

• NLP

• LLMs

• AGI

• Space Enthusiast

Contact me @ mcheema2010@gmail.com

System Cores: , System Memory: GB

Who I am ?

I am Musa Dildar Ahmed Cheema, a passionate and versatile individual with a strong foundation in Data Science. My journey has been guided by a thirst for knowledge and a drive to make a meaningful impact through technology. I believe that technology has the potential to transform lives, industries, and societies, and I've dedicated myself to harnessing this potential for the betterment of our world.

My Education

NUCES FAST ISB

Aug 2019 - June, 23

Bs Data Science

Coursework: AI, NLP, Big Data, Distributed Data Engg, DevOps, MLOps, Data Mining, Data Analysis & Visualization, Operating System, Computer Networks, Algorithms

Stanford University

June 2022 - Aug 22

Stanford Summer School, California

Coursework: CS229 Machine Learning, SOC-128D Mining Culture Through Text Data: Introduction to Social Data Science

Career Aspiration

My career aspiration is to become a visionary leader and innovator in the field of Data Science and Artificial Intelligence. I envision myself driving transformative change through the creative and responsible application of cutting-edge technologies to address real-world challenges. One of my most significant career aspirations is to pursue a Master's degree from Stanford . Ultimately, my career aspiration is to leave a lasting legacy as someone who not only contributed technically but also inspired and empowered others to reach new heights.

Experience

My Impact Meter

• Data Scientist

Feb, 2022 - Current

  - Developed data visualization dashboards for informed decision-making.

  - Automated the Payment & Services data pipeline, reducing manual processing by 80% and enhancing efficiency.

  - Developed Live LeaderBoard that show Live impact created by Impactors.

• Developer

Aug, 2020 - Feb, 2022

  - Utilized advanced tools to create a secure, scalable MVP using Flutter.

  - Demonstrated the potential of our product concept.

  - Collabrated with companies CEO, COO, and CFO.

Python, Flask, Pytorch, Transformers, Node.JS, Flutter, Next.JS, React.JS, Plotly, Scikit-Learn, Matplotlib, Socket.io, MongoDB, SQL, PowerBI, AWS, Docker

Data Insight Lab

• Research Assistant

Aug 2022 - June 2023

  - Engaged in an R&D-based Final Year Project focusing on "A solution to extract entities from unstructured handwritten bilingual medical prescriptions."

  - Pioneered the development of a state-of-the-art (SOTA) method named ViLanOCR, specialized in extracting handwritten text from bilingual Medical Prescriptions (Urdu and English).

  - Successfully resolved the longstanding challenge of extracting handwritten Urdu text, contributing to enhanced accuracy and efficiency in medical data processing.

  - Working on Publishing in Q1 Journal.

Pytorch, Python, MongoDB, Socket.io, Transformers, Tensorflow, LLMs, YOLO, React, Flutter, LabelImg, Latex, MatplotLib, Docker, Kubernetes, DVC, AWS, Hugging Face, Github Actions, Wikipedia

Knoweledge Data & Discovery Lab

• Research Assistant

Mar 2021 - Aug 2022

  - Contributed to a funded project by the National Center for Artificial Intelligence (NCAI), titled "Re-Designing E-recruitment using AI for Temporal Analysis."

  - Designed and developed a novel resume ranking algorithm for a recommender system.

  - Assisted in optimizing disk I/O for real-time entity enrichment from resumes in a graph database (ontology).

  - Worked with two PhD Professors and 5 Master Students and Succesfully published 2 Research Papers.

Python, NLP, Ontology, Neo.Js, ElasticDB, AWS, OpenCL, Bloomfilter, Latex, React, MongoDB, SQL, Github

NUCES FAST ISB

• Teacher Assistant

Jan 2023 - June 2023

  - Big Data and Analytics and Database Systems.

  - Taught 130 Students combined.

  - Pioneered an interactive teaching strategy, focusing on project-based learning to cultivate practical skills.

  - Facilitated hands-on learning, preparing students for real-world challenges in data management and analytics.

• Teacher Assistant

Aug 2022 - Dec 2022

  - Programming Fundamentals (PF).

  - Taught 300 Students.

  - Designed a comprehensive assignments focusing on the fundamental concepts of C++ programming.

  - Guided students through real-world coding challenges, enhancing problem-solving and algorithmic thinking skills.

  - Provided personalized feedback and assistance to ensure a strong grasp of core programming principles.

• Teacher Assistant

Jan 2022 - June 2022

  - Big Data and Analytics (BDA).

  - Taught 50 Students.

  - Crafted diverse assignments involving scalable data pipelines with Hadoop and Spark, and optimized database schema design.

  - Developed a comprehensive project that leveraged Kafka for mobile sensor data acquisition, orchestrating the entire pipeline from data ingestion to machine learning-driven analysis.

  - The initiative not only exposed students to real-time data handling but also provided insights into machine learning integration for informed decision-making.

Python, SQL, MongoDB, Hive, Hadoop, PySpark, Cassandra, Kafka, Pub Sub, AWS, Flask, io.Socket, Big Data Algorithms

Volunteer Work

 - Google Student Club Vice Head App Dev 2020

 - NASCOM Arrangments

Projects

ViLanOCR

MultiLingual OCR Trained on Urdu English, Chinese, Japnese Language. It uses MBART-50 and Swin as Decoder and Encoder. It is trained on over 50M images. It generalizes on Handwritten Urdu really well at 1.2% CER Only.

Transformers
Vit
HuggingFace

BART/LLama 7b SFT Trainer

Implementation for training Language Models on a single GPU using PEFT and Trained using Hugging Face SFT Trainer. Dataset used is Instruct. Maximum Model that can be trained is LLama 7B.

LLM
PEFT
HuggingFace

LLama2.C

Made a Chat UI for LLama2 Inference model in C by Andrew Karpathy. Built a socket server in C and used HTML to connect to socket server. This inference the Model on CPU in realtime. LLama can be trained for small tasks.

LLama
C
WebSocket.C

Chatbot GPT3

Chatbot to carry conversation with a client and take information from them in a conversation. Implemented it on Rules and regulation of companies data. Used Vector Database and Prompt Engineering to get good results.

Prompt Engg
OpenAI
PineCone

POS System

Implemented a POS System that can give live stats of sales from anywhere around the world. This project is deployed in shops. Pakistan is place that lacks these types of system this can help track as well as help you detect fraud with intelligent solutions integrated into it.

NextJS
ExpressJS
AWS

DBLP-HADOOP

An Efficent Method to load an XML of 32GB by using only 30MB of system memory using hadoop Loading it into Hadoop File System. Extends on also integrating MongoDB.

Hadoop
MongoDB

Publications

1.An efficient algorithm for ranking candidates in e-recruitment system
Abdul Hanan Minhas, Mohammad Daniyal Shaiq, Saad Ali Qureshi, Musa Dildar Ahmed Cheema, Shujaat Hussain, Kifayat Ullah Khan
Over the last decade, the growth of e-recruitment has resulted in the expansion of web channels dedicated to candidate recruitment, making it easy to find and apply for jobs. However, as a result, today’s human resource managers are inundated with applications for each job opening. This leads to the production of significant number of documents, referred to as resumes or curriculum vitae (CV).

2.Feature-Wise Ranking of Candidates through Maximum Degrees in Hidden Bipartite Graphs
Sarah Kiyani, Musa Dildar Ahmed Cheema, Saad Ali Qureshi, Shujaat Hussain, Kifayat Ullah Khan
In this day and age of technological breakthroughs, electronic recruitment tools have gained much recognition due to their increasing popularity among recruiters. Many methods like Learning To Rank and Multi-Criteria Decision making have been employed inside these tools to enhance the process. The ranking is one of the most important parts of e-recruitment on which these methods and techniques are applied. Among these methods, the research area of graphs has not been explored enough in the context of ranking.

3.Transformer based Urdu Handwritten Text Optical Character Reader
Sarah Kiyani, Musa Dildar Ahmed Cheema, Saad Ali Qureshi, Shujaat Hussain, Kifayat Ullah Khan
Extracting Handwritten text is one of the most important components of digitizing information and making it available for large scale setting. Handwriting Optical Character Reader (OCR) is a research problem in computer vision and natural language processing computing, and a lot of work has been done for English, but unfortunately, very little work has been done for low resourced languages such as Urdu.