Mohd Nauman - Data Engineer & Python Developer

About Me

Mohd Nauman

Data Engineer & Python Developer

I'm a passionate Data Engineer and Python Developer with a strong foundation in building and optimizing data pipelines. Currently pursuing a Bachelor's in Programming and Data Science from IIT Madras, I bring hands-on experience in SQL, Python, Spark, AWS, and Azure to solve complex data challenges.

My expertise lies in ELT processes, data warehousing, and developing efficient Python backends. I've successfully reduced compute costs by 33% and storage costs by up to 60% in previous projects, showcasing my ability to optimize both performance and resources.

Key Achievements:

Built PySpark pipelines handling 22+ tebibytes of data
Reduced processing time from days to minutes for large-scale data operations
Implemented cost-saving measures resulting in ₹3 lakhs monthly savings
Organized a successful tech event at IIT Madras with 250+ registrations

Certifications:

Harvard CS50

Azure Data Fundamentals

AWS Certified Cloud Practitioner

Cutshort Certified Python - Advanced

Technical Expertise

Apache Spark95%

AWS (S3, EC2, Redshift, EMR, Glue)90%

Azure (SQL, Blob, Functions)88%

Databricks85%

Kubernetes80%

Professional Journey

Data Ops

Ola Krutrim - Bengaluru, KA

Feb 2024 - Present

Leading data engineering initiatives and optimizing large-scale data processing pipelines.

Key Achievements:

Built a PySpark pipeline to handle 22 tebibytes of data, reducing processing time from 1-2 days to 20-30 minutes.
Developed an automation tool capable of fetching up to 5 terabytes of data from platforms like Hugging Face and arxiv.org within 60 minutes.
Implemented a multi-step data pipeline inspired by OBELICS, processing 230M+ web sources and extracting 50-100M Indic images with text.
Optimized compute costs by 33%, saving approximately ₹3 lakhs.

Python Developer

Namasys Private Limited - Remote

Jun 2022 - Feb 2024

Focused on data pipeline development, migrations, and storage optimization.

Key Achievements:

Engineered data migrations from data lakes to data warehouses using PySpark on EMR clusters.
Built and maintained over 50 data pipelines, increasing efficiency by 60%.
Enhanced data storage efficiency on Azure SQL and AWS Redshift, cutting storage needs by 35%.
Integrated data from multiple APIs into Azure SQL and AWS Redshift, deploying cloud functions for process automation.

Associate Software Engineer

Knackbout Studio Pvt Ltd - Bengaluru, KA

Aug 2021 - May 2022

Specialized in backend development and data analysis for various client projects.

Key Achievements:

Optimized airline forecasting process, reducing processing time by 80-90% through efficient use of pandas.
Reduced storage costs by 60% by transitioning from SQL to NoSQL databases.
Designed and implemented backend solutions using Django, handling bugs and issues effectively.
Built a Pyspark pipeline with caching to accelerate web analytics, achieving millisecond-level response times.